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7 Supplemental Figure SI: Process of BAC and whole-genome shotgun (WGS) 

8 hierarchical strategy. BACs were used in the hierarchical assemble strategy to overcome the 

9 high levels of genome heterozygosity. Furthermore, a series of WGS libraries (170 bp-40 
10 kbp) were used to build scaffolds and fill gaps. 
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1 Supplemental Figure S2 
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3 Supplemental Figure S2: K-mer analyses. A k-mer refers to an artificial sequence division 

4 of K nucleotides iteratively from reads. The k-mer distribution was bimodal, and the k-mer 

5 depth of the first peak (22) was half that of the second (44), implying that the genome of L. 

6 crocea was rich in heterogeneous sites. A read with L bp contains (L-K+l) k-mers if the 

7 length of each k-mer is K bp. Genome size G is estimated as G = K_num/K_depth. The 

8 X-axis is the depth of K-mers derived from the sequenced reads and the Y-axis is the 

9 frequency of the K-mer depth. 
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1 Supplemental Figure S3 
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4 Supplemental Figure S3: Depth of single-base distribution based on short-read 

5 alignment. To validate the completeness of genome assembly, high-quality reads were 

6 aligned against the assembly using Burrows-Wheeler aligners (Li and Durbin 2009). 
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Supplemental Figure S4 
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Supplemental Figure S4: Distribution of intron length, exon number, mRNA length, 
exon length, and coding region length in the genome of L. crocea and other related 
species. 
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2 

Ab initio 



Homolog 




3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 



RNAseq 

Supplemental Figure S5: Venn diagram representing the L. crocea gene models 
supported by the ab initio prediction, homology-based methods, and RNAseq-based 
data. 
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Supplemental Figure S6 




Supplemental Figure S6: Phylogenetic analysis of crystallins in teleosts. Crystallin 
protein sequences of zebrafish were used to predict crystallin genes in seven other fish 
species. The phylogenetic tree was constructed by the maximum likelihood method in PAML 
(Yang 1997). The khaki, orange, gold, grey, plum, wheat, and pink backgrounds represent 
crystallin genes in the genomes of medaka, Atlantic cod, zebrafish, green spotted pufferfish, 
three spined stickleback, Japanese pufferfish, and large yellow croaker respectively. 
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Supplemental Figure S7 




Supplemental Figure S7: Expansion of the olfactory receptor (OR) -like genes of "eta" 
group in L. crocea genome. The tree circular cladogram was constructed by the maximum 
likelihood method in PAML (Yang 1997). The blue, khaki, orange, gold, grey, plum, wheat, 
and pink backgrounds represent the OR-like genes of "eta" group in the genomes of human, 
medaka, Atlantic cod, zebrafish, green spotted pufferfish, three spined stickleback, Japanese 
pufferfish, and large yellow croaker respectively. 
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Supplemental Figure S8: Expansion of tripartite motif -containing protein 25 (TRIM25) 
gene family in L. crocea genome. The tree circular cladogram was constructed by the 
maximum likelihood method in PAML (Yang 1997). The blue, khaki, orange, grey, plum, 
wheat, and pink backgrounds represent TRIM25 genes in the genomes of human, medaka, 
Atlantic cod, green spotted pufferfish, three spined stickleback, Japanese pufferfish, and large 
yellow croaker respectively. 
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Supplemental Figure S9 




Supplemental Figure S9: Expansion of NOD-like receptor family CARD domain 
containing 3 (NLRC3) gene family in L. crocea genome. The tree circular cladogram was 
constructed by the maximum likelihood method in PAML (Yang 1997). The blue, khaki, 
orange, grey, plum, wheat, and pink backgrounds represent NLRC3 genes in the genomes of 
human, medaka, Atlantic cod, green spotted pufferfish, three spined stickleback, Japanese 
pufferfish, and large yellow croaker respectively. 
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Supplemental Figure S10 
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3 Supplemental Figure S10: Differentially expressed genes (DGEs) in the L. crocea brains 

4 under hypoxic and normal conditions. We define the FDR<0.001 and fold change>2 as 

5 significant DGEs. (A) The 5564 DGEs were significantly up-regulated at more than one time 

6 point after hypoxia exposure and not significantly down-regulated at other time points. (B) 

7 The 1948 DGEs were significantly down-regulated at more than one time point after hypoxia 

8 exposure and not significantly up-regulated at other time points. (C) The 890 DGEs were 

9 significantly up-regulated at some time points and significantly down-regulated at other time 

10 points under hypoxia. 
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Supplemental Figure Sll 
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Supplemental Figure Sll: Number of Differentially expressed genes (DGEs) at different 
time points under hypoxia. The significant DGEs are defined as fold change >2 and FDR 
<0.001. The 1 h, 3 h, 6 h, 12 h, 24 h, and 48 h represent the time points at which the brain 
tissues were harvested under hypoxia. The brain tissues harvested at 0 h were used as 
controls. 
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Supplemental Figure S12: The dynamic expression patterns of the genes involed in 
potential nerve-endocrine-immunity network in the L. crocea brain under hypoxia. 

Hypoxia can induce the expression of endothelin-1 (ET-1) and adrenomedullin (ADM), and 
then promote the expression of inflammatory cytokines IL-6 and TNF-a in brain, which may 
induce cerebral inflammation and injury. ET-l/ADM and IL-6/TNF-a form a 
positive-feedback pathway to amplify the cerebral inflammation. HPA axis-glucocorticoids 
and SOCS family members (SOCS-1 and SOCS-3) can inhibit IL-6/TNF-a expression, and 
constitute the negative-feedback regulatory loops with IL-6/TNF-a to modify the cerebral 
inflammation. The green arrow represents promoting expression, and the red interrupted line 
represents inhibiting expression. 
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Supplemental Figure S13 
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Supplemental Figure S13: Expansion of the N-acetylgalactosaminyl transferase 
(GALNT) gene family in L. crocea genome. Distribution of GALNTs 1-14 in the genomes 
of human, zebrafish, stickleback, Japanese pufferfish, green spotted pufferfish, and large 
yellow croaker is shown. 
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l Supplemental Figure S14 
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Supplemental Figure S14: Enrichment of Gene Ontology categories for skin mucus 
proteins. Here we applied the EnrichPipeline (Chen et al. 2010) to extract annotation 
information in Gene Ontology with P<0.01. The functions are summarized in three main 
categories: biological process, cellular component, and molecular function. 
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2 Supplemental Table SI: Summary of BACs used in L. crocea genome project 



Average 
Length of 
BAC (Kbp) 


BAC 


96-well 


Sequence 


Average per 


Genome 


Number 


Plates 


Bases (Gbp) 


BAC (x) 


Depth (x) 


120 


42,528 


443 


324.73 


63.63 


464 
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1 Supplemental Table S2: Summary of k-mer analysis 



k-mer 


k-mer 


Peak 


Genome 


Used 


Used 


Coverage of 


size 


number 


depth 


size 


base 


read 


Genome (x) 


17 


30,423,075,312 


44 


691,433,530 


36,217,946,800 


362,179,468 


52 



2 

3 Based on the k-mer analysis, the genome size of L. crocea is calculated to be 691 Mb. 
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l Supplemental Table S3: Stastics of BAC sequences used for mergence 



Contig 



Scaffold 



Length (bp) Number 



Length (bp) 



Number 



N90 
N80 
N70 
N60 
N50 
N40 
N30 
N20 
N10 

Total length (bp): 
Max length (bp): 
Number (>500 bp) 
Number (>2000 bp) 



I, 096 
2,057 
3,299 
4,781 
6,560 
8,741 

II, 514 
15,505 



563,506 

363,547 

248,223 

172,565 

118,845 

79,019 

48,980 

26,348 



22,331 



9,952 



3,006,049,398 
98,562 
893,977 
371,497 



3,360 
7,552 
13,256 
20,606 
29,220 
40,174 
52,841 
73,581 
100,567 



150,285 

89,358 

58,373 

39,642 

26,987 

17,904 

11,232 

6,228 

2,634 



3,098,436,274 
149,895 
388,748 
197,679 



3 BAC sequences, which were longer than 500 bp, were merged. 
4 
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1 Supplemental Table S4: Information of whole-genome shotgun reads 



Insert Size 


Average Read 
Length (bp) 


Total Data 

(Gb) 


Sequence 
Depth (x) 


Physical 
Depth (x) 


170 bp 


100 


13.95 


19.62 


49.06 


500 bp 


100 


22.27 


31.32 


26.62 


2kbp 


49 


18.87 


26.54 


541.68 


5kbp 


49 


4.84 


6.80 


346.97 


lOkbp 


49 


6.28 


8.83 


901.02 


20kbp 


49 


3.30 


4.64 


946.35 


40kbp 


49 


0.97 


1.36 


556.85 


Total 




70.48 


99.11 


3368.55 
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l Supplemental Table S5: Statistics of final assembly 







Contig 




Scaffold 






Length (bp) Number 


Length (bp) 


Number 


N90 


14,160 


11,390 


285,329 




672 


N80 


25,118 


7,939 


463,062 




487 


N70 


36,275 


5,754 


662,826 




362 


N60 


48,542 


4,180 


856,499 




272 


N50 


63,110 


2,980 


1,034,540 




200 


N40 


79,292 


2,045 


1,257,841 




140 


N30 


100,009 


1,298 


1,596,367 




91 


N20 


130,560 


711 


1,997,749 




53 


N10 


180,093 


274 


2,413,492 




22 


Total length (bp) 




661,327,267 




678,964,076 


Max length (bp) 




716,891 




4,914,789 


Number (>500 bp) 




20,385 




6,019 




Number (>1000 bp) 




27,015 




3,146 




Number (>2000 bp) 




26,882 




2,251 
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Supplemental Table S6: Genome assembly validation by transcripts mapping 







Aligned 


%50 aligned 


%80 aligned 


Transcripts 


Total 


Number of 


Percentage 


Number of 


Percentage 


Number of 


Percentage 


length 


transcripts 


transcripts 


(%) 


transcripts 


(%) 


transcripts 


(%) 


>500 bp 


39,106 


38,520 


98.50 


38,174 


97.62 


37,460 


95.79 


>1000 bp 


18,184 


17,966 


98.80 


17,786 


97.81 


17,421 


95.80 



2 

3 The male and female transcriptomes were from eleven mixed tissues respectively. Then the 

4 sequencing short reads were assemblied to transcripts. Transcripts were mapped to L. crocea 

5 genome by BLAT (Kent 2002) to validate completeness of genome assembly. 
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l Supplemental Table S7: Summary of genome assembly of seven sequenced teleost species 



Species 


Total Ipnoth nf 


Total Ipnoth 


PpIVPtlttiOP 
A ClCCllLcl^C 


NSO nf 

1UV Ul 


11JV \JL 


11 UlllUCl 


TViunhpi* 

11 UlllUCl 


scaffolds 


of contigs 


of 


scaffolds 


contigs 


of 


of 




(bp) 


(bp) 


gaps 


(bp) 


(bp) 


scaffolds 


contigs 


Larimichthys 
crocea 


678,964,076 


661,327,267 


2.60% 


1,034,540 


63,110 


6,019 


27,015 


Gadus morhua 


832,114,588 


607,869,276 


26.95% 


136,353 


2,310 


398,859 


555,245 


Takifugu 


393,312,790 


351,017,437 


10.75% 


858,115 


49,304 


7,214 


33,204 


mbripes 
















Oryzias latipes 


869,000,216 


700,384,697 


19.40% 


29,908,082 


9,628 


7,189 


134,399 


Tetraodon 


358,618,246 


302,293,082 


15.71% 


13,390,619 


30,260 


27 


33,235 


nigroviridis 
















Gasterosteus 
aculeatus 


461,533,321 


446,627,734 


3.23% 


18,115,788 


83,204 


1,842 


16,966 


Danio rerio 


1,412,464,843 


1,409,741,015 


0.19% 


54,093,808 


1,073,451 


1,133 


28,972 



















2 

3 Assemblies of six other teleost species are downloaded from Ensemble74. Since Oryzias 

4 latipes, Tetraodon nigroviridis, Gasterosteus aculeatus, and Danio rerio were anchored to 

5 chromosomes, their scaffold N50 is larger than 10 Mb. 
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1 Supplemental Table S8: Summary of repetitive elements in L. crocea genome 



RepeatMasker ProteinMasker Repeatscout Combination 



subtype 


#base 


% genome 


#base 


% genome 


#base 


% genome 


#base 


% genome 


DNA 


19 575 778 


2.88 


4 206 662 


0.62 


25 051 324 

. V/.. ' _L , JZ^T^ 


3.69 


35 335 543 


5.20 


LINE 


11,874,487 


1.75 


10,982,979 


1.62 


13,995,684 


2.06 


19,862,941 


2.93 


LTR 


9,754,813 


1.44 


5,727,464 


0.84 


8,078,576 


1.19 


14,945,772 


2.20 


Other 


7,999 


0.00 


0 


0.00 


0 


0.00 


7,999 


0.00 


SINE 


2,268,995 


0.33 


0 


0.00 


2,501,774 


0.37 


3,687,458 


0.54 


Satellite 


0 


0.00 


0 


0.00 


2,755,142 


0.41 


2,755,142 


0.41 


Simple 
repeat 


0 


0.00 


0 


0.00 


3,945,489 


0.58 


3,945,489 


0.58 


Unknown 


541,527 


0.08 


0 


0.00 


41,857,750 


6.16 


42,374,776 


6.24 


Total 


44,023,599 


6.48 


20,917,105 


3.08 


98,185,739 


14.46 


122,915,120 


18.10 



2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
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13 
14 
15 
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17 
18 
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25 
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l Supplemental Table S9: Top ten transposable elements (TE) in seven teleost species 



Fam ID 


Larimichthys 
crocea 


Danio 
rerio 


uactus 
morhua 


Gasterosteus 
aculeatus 


Oryzias 
latipes 


lakijugu 
rubripes 


letraoaon 
nigroviridis 


SINE/V 


5,616 


152,886 


168 


4,851 


14,797 


1,298 


1,401 


SINE/MIR 


3,833 


30 


3,797 


3,044 


6,296 


49 


27 


DNA/DNA 


3,831 


322,831 


1,366 


638 


2,517 


174 


41 


DNA/TcMar 


3,630 


188,595 


1,131 


2,564 


20,945 


3,708 


1,395 


SINE/Mermaid 


3,115 


39 


71 


249 


7,474 


3,321 


374 


LTR/Gypsy 


2,580 


32,588 


2,119 


6,135 


3,027 


3,698 


432 


LTR/ERVK 


1,776 


2,126 


8,475 


1,102 


157 


955 


1,205 


DNA/hAT 


1,634 


310,603 


1,375 


1,143 


5,919 


2,701 


1,615 


LINE/L2 


1,266 


9,323 


1,068 


1,148 


3,237 


1,281 


46 


LINE/RTE 


1,001 


1,458 


490 


549 


4,743 


1,188 


541 



2 



3 TE was identified by alignment and rebased by RepeatMasker. To confirm the completeness 

4 of TE, we chose TE with length >30 bp, aligning ratio>0.2, and divergence>50% for rebasing 

5 the reference TE, respectively. 

6 

7 

8 

9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
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1 Supplemental Table S10: Comparison of repeat content from nine sequenced vetebrate 

2 speices 



Genome length 
(Mbp) 



Repeat content from Repeat content from 
RepeatMasker ( % ) publication ( % ) 



Larimichthys crocea 



Danio rerio 



Gadus morhua 



Gasterosteus aculeatus 



Oryzias latipes 



Takifugu rubripes 



Tetraodon nigroviridis 



Mus musculus 



Homo sapiens 



679 
1,412 
832 
462 
869 
393 
359 
2,717 
3,096 



6.02 
49.48 
10.66 
7.08 
8.56 
7.74 
4.60 
37.69 
40.08 



18.1 
52.2 
25.4 
25.2 
17.5 



38 
45 



4 

5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 



Our results showed that L. crocea had a relative compact genome structure. 



23 
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l Supplemental Table Sll: Statistics of predicted protein-coding genes 







Gene 
number 


Complete 
ORF 


% 


Single 
Exongene 


% 


Average 
transcript 
length 
(bp) 


Average 
CDS 
length 
(bp) 


Average 

exons 
per gene 


Average 
exon 
length 
(bp) 


Average 
intron 
length 
(bp) 




august 


30,182 


30,014 


99.44 


3,873 


12.83 


10,123 


1,428 


7.99 


179 


1,243 


Denovo 


genescan 


38,196 


38,196 


100.00 


3,061 


8.01 


12,441 


1,529 


8.30 


184 


1,496 




snap 


65,053 


61,944 


95.22 


4,061 


6.24 


16,083 


1,104 


7.80 


142 


2,203 


RNAseq 


RNAseq 


39,528 


22 


0.06 


0 


0.00 


11,227 


2,151 


8.06 


267 


1,285 




Danio rerio 


31,003 


4,245 


13.69 


6,750 


21.77 


7,870 


1,369 


7.18 


191 


1,052 




Gasterosteus aculeatus 


31,883 


4,727 


14.83 


8,478 


26.59 


7,417 


1,224 


7.03 


174 


1,026 




Homo sapiens 


22,525 


2,020 


8.97 


4,224 


18.75 


9,213 


1,390 


8.20 


169 


1,086 


Homo 


Oreochromis niloticus 


34,331 


6,936 


20.20 


8,387 


24.43 


7,523 


1,299 


7.04 


184 


1,030 




Oryziaslatipes 


34,788 


4,169 


11.98 


9,248 


26.58 


6,608 


1,176 


6.43 


183 


1,001 




Takifugu rubripes 


27,029 


3,802 


14.07 


5,311 


19.65 


8,802 


1,396 


7.94 


176 


1,067 




Tetraodon nigroviridis 


25,342 


3,911 


15.43 


3,756 


14.82 


8,831 


1,401 


8.24 


170 


1,026 


Glean 


Glean 


26,922 


26,039 


96.72 


2,867 


10.65 


13,508 


1,716 


9.49 


181 


1,390 


Final 


Final (filt denovo genes 
with rpkm<l) 


25,401 


24,523 


96.54 


2,321 


9.14 


13,816 


1,766 


9.91 


178 


1,353 




Danio rerio 


25,663 


19,100 


74.43 


1,636 


6.37 


24,727 


1,583 


9.28 


170 


2,794 




Gasterosteus aculeatus 


20,756 


8,146 


39.25 


1,085 


5.23 


8,577 


1,539 


10.41 


148 


748 


Closely 


Homo sapiens 


20,087 


19,111 


95.14 


2,188 


10.89 


51,959 


1,608 


9.46 


170 


5,952 


related 


Oreochromis niloticus 


21,437 


12,917 


60.26 


1,067 


4.98 


14,906 


1,714 


10.90 


157 


1,332 


species 


Oryzias latipes 


19,658 


6,978 


35.50 


1,009 


5.13 


12,428 


1,505 


10.21 


147 


1,186 




Takifugu rubripes 


18,508 


5,487 


29.65 


643 


3.47 


7,719 


1,651 


11.05 


149 


604 




Tetraodon nigroviridis 


19,570 


6,837 


34.94 


777 


3.97 


6,191 


1,512 


10.52 


144 


492 



2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
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1 Supplemental Table S12: Functional classification of L. crocea genes 





Number 


Percent 


InterPro 


20943 


82.45% 


KEGG 


9951 


39.18% 


Swissprot 


23178 


91.25% 


TrEMBL 


24671 


97.13% 


Annotated 


24729 


97.35% 


Unannotated 


672 


2.65% 



2 

3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
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1 Supplemental Table S13: Summary of gene families in the genomes of nine sequenced 

2 vertebrate species by Treefam 



Species 


# i otai 

genes 


#Unclustered 
genes 


#Families 


#Unique 
families 


Ave. genes 
per family 


Larimichthys crocea 


25,387 


1,687 


14,698 


215 


1.61 


Danio rerio 


25,618 


1,218 


14,378 


112 


1.7 


Gadus morhua 


20,490 


567 


14,024 


12 


1.42 


Gallus gallus 


16,344 


2,987 


11,840 


99 


1.13 


Gasterosteus aculeats 


ZU, / Jo 


/oD 




10 


1 A1 
1 .4/ 


Homo sapiens 


19,959 


3,253 


13,215 


421 


1.26 


Oryzias latipes 


19,529 


1,040 


13,105 


80 


1.41 


Takifugu mbripes 


18,453 


232 


12,888 


7 


1.41 


Tetraodon nigroviridis 


19,520 


828 


13,206 


49 


1.42 



3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
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1 Supplemental Table S14: Gene Ontology of expanded gene families in L. crocea genome 



GO ID 


GO Term 

V J 1 V 1 111 


P~ value 


Adjusted 
P-value 


GO:0005911 


cell-cell junction 


5.10E-72 


2.50E-69 


GO:0044430 


cytoskeletal part 


2.18E-35 


2.67E-33 


GO:0005923 


tight junction 


1.45E-30 


8.85E-29 


GO:0046872 


metal ion binding 


2.15E-30 


1.17E-28 


GO:0003774 


motor activity 


1.06E-27 


4.33E-26 


GO:0016459 


myosin complex 


1.47E-23 


5.53E-22 


GO:0007156 


homophilic cell adhesion 


1.27E-22 


4.45E-21 


GO:0004222 


metalloendopeptidase activity 


4.21E-19 


1.21E-17 


GO:0046914 


transition metal ion binding 


8.55E-19 


2.33E-17 


GO:0005856 


cytoskeleton 


9.35E-18 


2.41E-16 


GO:0008270 


zinc ion binding 


6.54E-17 


1.53E-15 


GO:0005198 


structural molecule activity 


3.50E-14 


6.94E-13 


GO:0008237 


metallopeptidase activity 


3.54E-14 


6.94E-13 


GO:0044459 


plasma membrane part 


1.75E-13 


2.96E-12 


GO:0004499 


N,N-dimethylanilinemonooxygenase activity 


1.34E-11 


1.99E-10 


GO:0016165 


lipoxygenase activity 


2.96E-11 


4.15E-10 


GO:0004930 


G-protein coupled receptor activity 


1.95E-10 


2.46E-09 


GO:0005833 


hemoglobin complex 


8.04E-10 


9.39E-09 


GO:0030246 


carbohydrate binding 


8.53E-10 


9.59E-09 


GO:0016702 


oxidoreductase activity, acting on single donors with 
incorporation of molecular oxygen, incorporation of two 
atoms of oxygen 


1.98E-09 


2.07E-08 


GO:0006691 


leukotriene metabolic process 


5.05E-09 


4.67E-08 


GO:0015671 


oxygen transport 


5.75E-09 


5.12E-08 


GO:0019825 


oxygen binding 


1.01E-08 


8.55E-08 


GO:0005529 


sugar binding 


1.37E-07 


1.07E-06 


GO:0003956 


NAD(P)+-protein-arginine ADP-ribosyltransferase 
activity 


1.83E-07 


1.40E-06 


GO:0043234 


protein complex 


3.36E-07 


2.42E-06 


GO:0005488 


binding 


4.42E-07 


3.14E-06 


GO:0006915 


apoptotic process 


5.81E-07 


3.95E-06 


GO:0005509 


calcium ion binding 


7.51E-07 


5.04E-06 


GO:0004668 


protein-arginine deiminase activity 


9.35E-07 


5.87E-06 


GO:0050661 


NADP binding 


1.05E-06 


6.54E-06 


GO:0042981 


regulation of apoptotic process 


2.48E-06 


1.43E-05 


GO:0004888 


transmembrane signaling receptor activity 


2.82E-06 


1.52E-05 


GO:0018101 


peptidyl-citrulline biosynthetic process from 
peptidyl-arginine 


3.70E-06 


1.89E-05 


GO:0016503 


pheromone receptor activity 


3.70E-06 


1.89E-05 


GO:0043232 


intracellular non-membrane -bounded organelle 


1.37E-05 


6.03E-05 
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GO:0005874 


microtubule 


1.88E-05 


8.16E-05 


GO:0005922 


connexon complex 


2.99E-05 


1.26E-04 


GO:0051258 


protein polymerization 


7.04E-05 


2.83E-04 


GO:0004869 


cysteine -type endopeptidase inhibitor activity 


8.32E-05 


3.31E-04 


GO:0007155 


cell adhesion 


8.65E-05 


3.37E-04 


GO:0016042 


lipid catabolic process 


8.79E-05 


3.39E-04 


GO:0000228 


nuclear chromosome 


9.89E-05 


3.76E-04 


GO:0044464 


cell part 


1.22E-04 


4.51E-04 


GO:0044446 


intracellular organelle part 


1.65E-04 


5.90E-04 


GO:0004872 


receptor activity 


1.70E-04 


5.99E-04 


GO:0017111 


nucleoside-triphosphatase activity 


2.50E-04 


8.65E-04 


GO:0006471 


protein ADP-ribosylation 


2.88E-04 


9.76E-04 


GO:0005525 


GTP binding 


5.33E-04 


1.71E-03 


GO:0009395 


phospholipid catabolic process 


6.28E-04 


1.97E-03 


GO:0071822 


protein complex subunit organization 


7.31E-04 


2.24E-03 


GO:0050660 


flavin adenine dinucleotide binding 


7.93E-04 


2.38E-03 


GO:0007018 


microtubule -based movement 


1.37E-03 


4.01E-03 


GO:0005506 


iron ion binding 


1.53E-03 


4.44E-03 


GO:0004623 


phospholipase A2 activity 


1.92E-03 


5.45E-03 


GO:0008378 


galactosyltransferase activity 


2.03E-03 


5.70E-03 


GO:0032991 


macromolecular complex 


2.96E-03 


8.16E-03 


GO:0007017 


microtubule -based process 


3.11E-03 


8.52E-03 


GO:0004952 


dopamine receptor activity 


3.42E-03 


9.25E-03 


GO:0008146 


sulfotransferase activity 


3.44E-03 


9.25E-03 


GO:0004620 


phospholipase activity 


3.67E-03 


9.72E-03 


GO:0004175 


endopeptidase activity 


5.46E-03 


1.40E-02 


GO:0007050 


cell cycle arrest 


5.57E-03 


1.41E-02 


GO:0004683 


calmodulin-dependent protein kinase activity 


8.04E-03 


2.00E-02 


GO:0005272 


sodium channel activity 


8.45E-03 


2.09E-02 


GO:0007049 


cell cycle 


9.74E-03 


2.39E-02 


GO:0007186 


G-protein coupled receptor signaling pathway 


9.86E-03 


2.39E-02 


GO:0044255 


cellular lipid metabolic process 


1.01E-02 


2.43E-02 


GO:0044425 


membrane part 


1.02E-02 


2.44E-02 


GO:0045028 


G-protein coupled purinergic nucleotide receptor activity 


1.07E-02 


2.53E-02 


GO:0008158 


hedgehog receptor activity 


1.38E-02 


3.22E-02 


GO:0005515 


protein binding 


1.70E-02 


3.94E-02 
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1 Supplemental Table S15: Gene Ontology of contracted gene families in L. crocea genome 



GO ID 


GO Term 

V_J \.J 1 VI 111 


P~ value 


Adjusted 
P-value 


GO:0000786 


nucleosome 


2.78E-15 


1.29E-13 


GO:0006334 


nucleosome assembly 


5.48E-15 


1.29E-13 


GO:0034622 


cellular macromolecular complex assembly 


1.03E-14 


2.16E-13 


GO:0007186 


G-protein coupled receptor signaling pathway 


8.69E-10 


6.32E-09 


GO:0004984 


olfactory receptor activity 


1.85E-09 


1.25E-08 


GO:0043232 


intracellular non-membrane-bounded organelle 


7.19E-08 


4.12E-07 


GO:0044446 


intracellular organelle part 


4.06E-07 


2.19E-06 


GO:0008417 


fucosyltransferase activity 


5.19E-07 


2.72E-06 


GO:0032991 


macromolecular complex 


3.21E-06 


1.64E-05 


GO:0009987 


cellular process 


6.83E-06 


3.40E-05 


GO:0016021 


integral to membrane 


1.39E-05 


6.55E-05 


GO:0043231 


intracellular membrane-bounded organelle 


4.22E-05 


1.85E-04 


GO:0003677 


DNA binding 


5.52E-05 


2.32E-04 


GO:0005634 


nucleus 


6.43E-05 


2.64E-04 


GO:0044424 


intracellular part 


1.47E-04 


5.44E-04 


GO:0043229 


intracellular organelle 


1.53E-04 


5.46E-04 


GO:0090304 


nucleic acid metabolic process 


2.05E-04 


7.06E-04 


GO:0044260 


cellular macromolecule metabolic process 


3.21E-04 


1.08E-03 


GO:0006486 


protein glycosylation 


8.44E-04 


2.70E-03 


GO:0043170 


macromolecule metabolic process 


9.73E-04 


2.97E-03 


GO:0016020 


membrane 


1.09E-03 


3.27E-03 


GO:0050794 


regulation of cellular process 


3.01E-03 


8.35E-03 


GO:0005783 


endoplasmic reticulum 


1.17E-02 


2.88E-02 


GO:0035014 


phosphatidylinositol 3 -kinase regulator activity 


1.50E-02 


3.60E-02 


GO:0044238 


primary metabolic process 


1.69E-02 


4.00E-02 



2 
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1 Supplemental Table S16: Positive selection genes in L. crocea genome 



Gene_ID Function 



Lcro_ 


.GLEAN. 


.10004649 


40S nbosomal protein S 1 8 


Lcro_ 


.GLEAN. 


.10020306 


55 kDa erythrocyte membrane protein 


Lcro_ 


.GLEAN. 


.10021617 


6-phosphofructokinase type C 


Lcro_ 


.GLEAN. 


.10020843 


A disintegrin and metalloproteinase with thrombospondin motifs 6 


Lcro_ 


.GLEAN. 


.10017210 


Actin-related protein 2/3 complex subunit 5 


Lcro_ 


.GLEAN. 


.10008076 


Actin-related protein 6 


Lcro_ 


.GLEAN. 


.10011849 


Adenosylhomocysteinase B 


Lcro_ 


.GLEAN. 


.10018601 


Adenylosuccinatesynthetaseisozyme 2 


Lcro_ 


.GLEAN. 


.10010489 


Alcohol dehydrogenase class-3 


Lcro_ 


.GLEAN. 


.10007833 


Ankyrin repeat and SOCS box protein 7 


Lcro_ 


.GLEAN. 


.10014551 


AP-1 complex subunit mu-1 


Lcro_ 


.GLEAN. 


10013443 


Arginine-glutamic acid dipeptide repeats protein 


Lcro_ 


.GLEAN. 


.10002004 


ATP-dependent RNA helicase DDX19B 


Lcro_ 


.GLEAN. 


.10013668 


Beta-chimaerin 


Lcro_ 


.GLEAN. 


.10015172 


Calcium-dependent secretion activator 1 


Lcro_ 


.GLEAN. 


.10011303 


cAMP and c AMP -inhibited cGMP 3',5'-cyclic phosphodiesterase 10A 


Lcro_ 


.GLEAN. 


.10022363 


Casein kinase I isoform gamma- 1 


Lcro_ 


.GLEAN. 


.10013367 


CDK5 and ABL1 enzyme substrate 1 


Lcro_ 


.GLEAN. 


.10017766 


Coiled-coil domain-containing protein 58 


Lcro_ 


.GLEAN. 


.10018252 


Collagen alpha-3(IX) chain 


Lcro_ 


.GLEAN. 


.10002872 


Deoxyhypusine synthase 


Lcro_ 


.GLEAN. 


.10026530 


Dual specificity protein kinase CLK2 


Lcro_ 


.GLEAN. 


.10019946 


E3 ubiquitin-protein ligase SMURF1 


Lcro_ 


.GLEAN. 


.10014912 


Endoplasmic reticulum-Golgi intermediate compartment protein 3 


Lcro_ 


.GLEAN. 


.10015138 


Ephrin type -A receptor 3 


Lcro_ 


.GLEAN. 


.10005594 


Exocyst complex component 4 


Lcro_ 


.GLEAN. 


.10022003 


F-actin-capping protein subunit beta isoforms 1 and 2 


Lcro_ 


.GLEAN. 


.10001581 


F-box only protein 32 


Lcro_ 


.GLEAN. 


.10005953 


Gl/S-specific cyclin-D2 


Lcro_ 


.GLEAN. 


.10009848 


Gamma-aminobutyric acid receptor subunit rho-3 


Lcro_ 


.GLEAN. 


.10019755 


Gamma-tubulin complex component 5 


Lcro_ 


.GLEAN. 


.10008964 


Gap junction beta-1 protein 


Lcro_ 


.GLEAN. 


.10010879 


Glycine amidinotransferase, mitochondrial 


Lcro_ 


GLEAN. 


.10022207 


Glycylpeptide N-tetradecanoyltransferase 2 


Lcro_ 


GLEAN. 


.10025336 


Insulin-like growth factor 2 mRNA-binding protein 3 


Lcro_ 


GLEAN. 


.10002040 


Large neutral amino acids transporter small subunit 2 


Lcro_ 


.GLEAN. 


.10020640 


Liprin-alpha-2 


Lcro_ 


GLEAN. 


.10018775 


Mediator of RNA polymerase II transcription subunit 14 


Lcro_ 


.GLEAN. 


.10020068 


Membrane -bound transcription factor site-1 protease 


Lcro_ 


.GLEAN. 


.10003950 


Metallophosphoesterase MPPED2 


Lcro_ 


GLEAN. 


.10003121 


Mitochondrial fission 1 protein 
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Lcro_GLEAN_ 10020274 Mothers against decapentaplegic homolog 1 

Lcro_GLEAN_ 10025732 Multidrug resistance-associated protein 1 

Lcro_GLEAN_ 100 17264 Myosin light chain 1, skeletal muscle isoform 

Lcro_GLEAN_ 1 0002 1 26 Myotubularin-related protein 9 

Lcro_GLEAN_10017786 N-alpha-acetyltransferase 50 

Lcro_GLEAN_1001 13 12 Netrin receptor UNC5B 

Lcro_GLE AN_ 1 002 1316 Neuronal membrane glycoprotein M6-b 

Lcro_GLEAN_10018924 Nuclear pore complex protein Nup93 

Lcro_GLEAN_10015268 Ornithine decarboxylase antizyme 1 

Lcro_GLE AN_ 1 0020960 PAB -dependent poly( A)-specific ribonuclease subunit 2 

Lcro_GLEAN_10026833 Paired box protein Pax-3 

Lcro_GLEAN_10010560 PHD finger protein 10 

Lcro_GLEAN_10017832 Phosphatidylinositol 3,4,5-trisphosphate -dependent Rac exchanger 2 protein 

Lcro_GLEAN_10004465 Pre-mRNA-splicing factor CWC22 homolog 

Lcro_GLEAN_10025250 Protein FAM49A 

Lcro_GLEAN_10012664 Protein Wnt-2 

Lcro_GLEAN_10018964 Protein Wnt-3a 

Lcro_GLEAN_10002360 Ras-related protein Rab-4B 

Lcro_GLEAN_10015992 RNA polymerase II subunit A C-terminal domain phosphatase SSU72 

Lcro_GLEAN_10002022 rRNA 2'-0-methyltransferase fibrillarin 

Lcro_GLEAN_10021731 Septin-2 

Lcro_GLEAN_ 10024609 Serine/threonine -protein kinase A-Raf 

Lcro_GLEAN_ 1002643 8 Serine/threonine -protein kinase B-raf 

Lcro_GLEAN_ 100028 10 Serine/threonine -protein phosphatase 2A 56 kDa regulatory subunit epsilon 

isoform 

Lcro_GLEAN_ 100 19423 Small G protein signaling modulator 3 

Lcro_GLEAN_10015044 SNW domain-containing protein 1 

Lcro_GLEAN_ 10009 105 Sodium/potassium/calcium exchanger 4 

Lcro_GLEAN_10025238 Sodium/potassium-transporting ATPase subunit beta- 1 -interacting protein 4 

Lcro_GLEAN_ 10001011 Sphingosine 1-phosphate receptor 1 

Lcro_GLEAN_10013245 Spindlin-1 

Lcro_GLEAN_10014414 Stathmin-4 

Lcro_GLEAN_ 100 19921 Sterol-4-alpha-carboxylate 3-dehydrogenase, decarboxylating 

Lcro_GLEAN_10020234 Syntaxin-binding protein 1 

Lcro_GLEAN_10016560 Syntaxin-binding protein 5-like 

Lcro_GLEAN_10009242 T-complex protein 1 subunit delta 

Lcro_GLEAN_ 1 00 1 8 1 09 Tetraspanin- 1 8 

Lcro_GLEAN_10010087 Tetratricopeptide repeat protein 13 

Lcro_GLEAN_ 1 00 1 5907 Thrombospondin-2 

Lcro_GLEAN_ 10024405 Transcription elongation factor 1 homolog 

Lcro_GLEAN_ 10016129 Ubiquitin-conjugating enzyme E2 W 

Lcro_GLEAN_10017384 Ubiquitin-like modifier-activating enzyme 1 

Lcro_GLEAN_10021238 Ubiquitin-protein ligase E3C 

Lcro_GLEAN_ 100 15446 Unconventional myosin-VI 
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UTP-glucose- 1 -phosphate uridylyltransferase 
Vam6/Vps39-like protein 
Vesicular glutamate transporter 1 
WD repeat and FYVE domain -containing protein 3 
WD repeat-containing protein mio 
Zinc finger protein 319 
Zinc finger protein 536 
Zinc transporter 7 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 



Lcro_GLEAN_10009254 
Lcro_GLEAN_10025133 
Lcro_GLEAN_ 1 0006 1 79 
Lcro_GLEAN_10025438 
Lcro_GLEAN_ 1 002 1 755 
Lcro_GLEAN_10020092 
Lcro_GLEAN_ 1 000544 1 
Lcro_GLEAN_ 1 000 1 0 1 4 
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1 Supplemental Table S17: Copy number of vision-related genes in seven sequenced 

2 teleost species 



Gene 


Larimichthys 


Danio 


Gadus 


Gasterosteus 


Oryzias 


Takifugu 


Tetraodon 


name 


crocea 


rerio 


morhua 


aculeatus 


latipes 


rubripes 


nigroviridis 


crygm2b" 


12 


3 


6 


1 


8 


4 


4 


crybal * 


5 


4 


4 


4 


3 


3 


2 


crybb3* 


3 


1 


2 


2 


2 


2 


2 


rdhl2 


20 


18 


11 


14 


12 


11 


11 


arl6 


27 


17 


15 


21 


19 


22 


17 


slcl7a6b 


9 


7 


5 


5 


7 


7 


4 


unci 19b 


7 


4 


3 


3 


4 


3 


3 



3 

4 Genes are abbreviated as crygm2b: crystallin gamma M2b; crybal: crystallin beta Al; crybb3: 

5 crystallin beta B3; rdhl2: Retinol dehydrogenase 12; arl6: ADP-ribosylation factor-like 6; 

6 slc!7a6b: solute carrier family 17, member 6b; unci 19b: unc-119 homolog B. 

7 'Several crystallin genes (crygm2b, crybal, and crybb3), which encode proteins that maintain 

8 the transparency and refractive index of the lens (Chen et al. 2014), were markedly expanded 

9 in the genome of L. crocea relative to those of other sequenced teleosts. The specific 

10 expansion of these crystallin genes may be helpful for improving photosensitivity by 

11 increasing lens transparency, thereby enabling the fish to easily find food and avoid predation 

12 underwater. 

13 

14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
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1 Supplemental Table S18: Olfactory receptor-like gene repertoire in seven sequenced 

2 teleost species 



Air Water Air/Water 



opecies 

name 


alpha 


gamma 


delta 


epsilon 


zeta 


eta* 


beta 


Total 


Larimichthys 


0 


0 


66 


4 


1 A 

10 


30 


2 


112 


crocea 


















morhua 


0 


1 


65 


3 


17 


10 


1 


97 


Danio 
rerio 


u 


1 


69 


13 


36 


26 


7 


152 


aculeatus 


0 


3 


80 


4 


18 


3 


1 


109 


Oryzias 
latip 


0 


0 


39 


4 


10 


14 


3 


70 


Takifugu 
rubripes 


0 


0 


41 


2 


4 


6 


1 


54 


Tetraodon 
nigroviridis 


0 


0 


33 


2 


2 


6 


1 


44 



3 

4 A potential functional gene is a sequence that does not contain nonsense or frame shift 

5 mutation, which was re-checked by BLAST searches against the Swissprot database. Only 

6 those proteins that gave an 'Olfactory receptor' hit and with greater than 270 amino acids in 

7 length were retained and defined as functional olfactory receptor- like genes. 

8 * L. crocea possessed the highest number of genes that were classified into the "eta" group 

9 (30, P < 0.001), and these genes may contribute to the olfactory detection abilities, which 
10 could be useful for feeding and migration (Li et al. 1995). 

11 

12 
13 

14 
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1 Supplemental Table S19: Copy number of auditory sense-related genes in seven 

2 sequenced teleost species 



Gene Larimichthys Danio Gadus Gasterosteus Oryzias Takifugu Tetraodon 
name crocea rerio* morhua aculeatus latip rubripes nigroviridis 

OTOF 5 3 1 1 12 0 

claudinj 24 24 15 13 18 15 15 

OTOL1 11 7 5 6 8 8 5 

3 

4 For good communication, fish have developed high sensitivities to environmental sound. 

5 Three important auditory genes, OTOF, claudinj, and OTOL1, were significantly expanded in 

6 the L. crocea genome (P < 0.01). These expansions may contribute to the detection of sound 

7 signaling during communication, and thus to reproduction and survival. Genes are 

8 abbreviated as OTOF: otoferlin; OTOL1: otolin-1. 

9 * One more round of genome duplication occurred in Danio rerio relative to other fish 
10 species. 

11 

12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 

41 



1 



2 Supplemental Table S20: Comparsion of the genes encoding for selenoproteins between 

3 L. crocea and other sequenced vertebrate species 



Species 


Larimichthys 


Oryzias 


Takifugu 


Danio 


Gasterosteus 


Tetraodon 


Homo 


Mus 


crocea 


latip 


rubripes 


rerio 


aculeatus 


nigroviridis 


sapiens 


musculus 




DI1 


DI1 


DI1 


DI1 


DI1 


DI1 


DI1 


DI1 




DI2 


DI2 


DI2 


DI2 


DI2 


DI2 


D12 


DI2 




DI3 


DI3 


DI3 


DI3 


DI3 


DI3 


DI3 


DI3 




DI3b 


DI3b 


DI3b 


DI3b 


DI3b 


DI3b 








GPxla 


GPxl 


GPxl 


GPxl 


GPxl 


GPxl 


GPxl 


GPxl 




GPxlb 




GPxlb 


GPxlb 


GPxlb 


GPxlb 








GPxlb2 




















GPx2 


GPx2 


GPx2 


GPx2 


GPx2 


GPx2 


GPx2 




KjrXJ 




LrrXJ 




C I) v . J 
KjrXJ 


kjlXj 


KjrXD 


LrrXJ 






GPx3b 


GPx3b 


GPx3b 


GPx3b 


GPx3b 








GPx4a 


GPx4 


GPx4 


GPx4 


GPx4 


GPx4 


GPx4 


GPx4 




GPx4b 


GPx4b 


GPx4b 


GPx4b 


GPx4b 


GPx4b 








GPx4b2 


















GPx6 












GPx6 








Fepl5 


Fepl5 


Fepl5 


Fepl5 


Fepl5 








Sell5 


Sell5 


Sell 5 


Sell5 


Sell 5 


Sell5 


Sell5 


Sell5 




SelL 


SelL 


SelL 


SelL 


SelL 


SelL 






Gene 


SelH 
SelJl 


SelH 
Sell 


SelH 
Sell 


SelH 
SelJ 


SelH 
SelJ 


SelH 
SelJ 


SelH 


SelH 


name 


SelJ2 


SelJ2 






SelJ2 










Sell 


Sell 


Sell 


Sell 


Sell 


Sell 


Sell 


Sell 




SelK 


SelK 


SelK 


SelK 


SelK 


SelK 


SelK 


SelK 




SelM 


SelM 


SelM 


SelM 


SelM 


SelM 


SelM 


SelM 




SelN 


SelN 


SelN 


SelN 


SelN 


SelN 


SelN 


SelN 




SelOl 


SelO 


SelO 


SelO 


SelO 


SelO 


SelO 


SelO 




Sel02 






Sel02 












SelP 


SelP 


SelP 


SelP 


SelP 


SelP 


SelP 


SelP 




SelPb 


SelPb 


SelPb 


SelPb 


SelPb 


SelPb 








MsrBla 


MsrBl 


MsrBl 


MsrBl 


MsrBl 


MsrBl 


MsrBl 


MsrBl 




MsrBlb 


MsrBlb 


MsrBlb 


MsrBlb 


MsrBlb 


MsrBlb 








MsrBlc 


















MsrBld 


















MsrBle 


















SelS 


SelS 


SelS 


SelS 


SelS 


SelS 


SelS 


SelS 




SelTla 


SelTl 


SelTl 


SelTl 
SelTlb 


SelTl 


SelTl 


SelTl 


SelTl 




SelT2 


SelT2 


SelT2 


SelT2 


SelT2 


SelT2 







SelUa 
SelUb 
SelUb2 

SelW2 



SPS2 

TR2 
TR3 



SelUl SelUl SelUl SelUl 

SelUlb 

SelUlc SelUlc SelUlc SelUlc 

SelWl 

SelW2 SelW2 SelW2 SelW2 

SelW2b 

SelW2c SelW2c SelW2c 

SPS2a SPS2a SPS2a SPS2a 

TR1 TR1 TR1 TR1 

TR3 TR3 TR3 TR3 



SelUl 

SelUlb 

SelUlc 

SelW2 



SPS2a 

TR1 
TR3 



SelV 
SelUl 



SelWl 



SelV 
SelUl 



SelWl 



SPS2b 

TR1 

TR3 



SPS2b 

TR1 

TR3 



Total 



40 



35 



36 



38 



36 



35 



25 



24 



5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 



Genes are abbreviated as DI: Iodothyroninedeiodinase; GPx: Glutathione peroxidase; Fepl5: 
fish 15 kDa selenoprotein-like protein; Set selenoprotein; Msr. Methionine sulfoxide 
reductase; SPS: Selenophosphatesynthetase; TR: Thioredoxin reductase. 
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1 Supplemental Table S21: Characterization of the L. crocea immune system 





Categories 


Gene 
number 




Pattern recognition receptors 


179 




Antimicrobial peptides 


17 




Complement system 


119 




Lectins 


265 




Interferon system 


70 


Innate immunity 


Interleukin -1 family 


29 




Tumor necrosis factor family 


13 




Scavenger receptor family 


17 




Immune negative regulators 


18 




Immune signalling factors 


30 




Chemokines 


62 


Total 




819 




Antigen-presentation system 


176 




T-cell lineage markers 


168 




B-cell lineage markers 


65 


Adaptive immunity 


Plasma cell markers 


9 




Memory T/B cell markers 


62 




T/B cell development related genes 


52 




Gene rearrangement-related factors 


168 




Immunoglobulins and Ig family members 


1005 


Total 




1705 


Total of immune-relevant genes 




2524 



2 
3 
4 
5 
6 
7 
8 
9 
10 
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1 Supplemental Table S22: Number of genes related to immunity in L. crocea and other 

2 six fish genomes 



Gene 


Larimichthys 


Danio 


Gadus 


Gasterosteus 


Oryzias 


Takifugu 


Tetraodon 


crocea 


rerio 


morhua 


aculeatus 


latip 


rubripes 


nigroviridis 


NLRC3 


76 


290 


1 


3 


6 


19 


14 


TRIM25 


54 


80 


15 


16 


42 


14 


12 


IgHV-CAM 


38 


2 


1 


12 


12 


37 


19 


Meplb 


24 


10 


9 


13 


11 


14 


11 


CLEC17A 


21 


37 


9 


10 


9 


8 


10 


Clql4 


19 


14 


12 


8 


9 


6 


9 


Gimap8 


18 


33 


1 


3 


7 


2 


0 


IFI44 


13 


17 


8 


8 


3 


10 


6 


EEF1A1 


10 


3 


3 


4 


3 


3 


4 


LRRC70 


8 


4 


4 


4 


3 


3 


4 


Mrcl 


7 


2 


3 


4 


4 


4 


2 


Tnfsfl4 


5 


10 


8 


1 


1 


0 


0 


VTCN1 


4 


1 


0 


1 


5 


0 


1 


Bax 


4 


5 


2 


2 


1 


2 


1 


cGAS 


3 


1 


0 


0 


0 


0 


0 


DDX41 


3 


1 


1 


1 


1 


1 


1 


IGSF9B 


2 


1 


0 


1 


0 


0 


2 



3 Note: Genes are abbreviated as NLRC3: NOD-like receptor family CARD domain containing 

4 3; TRIM25: tripartite motif-containing protein 25; IgHV-CAM: Ig heavy chain V-III region 

5 CAM; Meplb: Meprin A subunit beta; CLEC17A: C-type lectin domain family 17, member A; 

6 Clql4: Complement Clq-like protein 4; Gimap8: GTPase IMAP family member 8; IFI44: 
1 Interferon-induced protein 44; EF1A1: Elongation factor 1-alpha 1; LRRC70: Leucine-rich 

8 repeat-containing protein 70; Mrcl: Macrophage mannose receptor 1; Tnfsfl4: Tumor 

9 necrosis factor receptor superfamily member 14; VTCN1: V-set domain-containing T-cell 

10 activation inhibitor 1; Bax: Apoptosis regulator BAX; cGAS: Cyclic GMP-AMP synthase; 

1 1 DDX41 : Probable ATP-dependent RNA helicase DDX4 1 ; IGSF9B : Protein turtle homolog B . 

12 
13 
14 
15 
16 
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1 Supplemental Table S23: Expression profiles of potential nerve-endocrine-immunity 

2 network- related genes in the L. crocea brain under hypoxia 





Gene ID 


BrOh 


Brlh 


Br3h 


Br6h 


Brl2h 


Br24h 


Br48h 


Lcro_ 


_GLEAN_ 


_10017820 


51.35 


9.81 


25.12 


32.98 


6.79 


16.34 


25.41 


Lcro_ 


_GLEAN_ 


.10017036 


6.04 


1.27 


6.04 


1.77 


3.23 


4.14 


5.64 


Lcro_ 


_GLEAN_ 


,10000999 


16.16 


0.69 


1.69 


9.98 


3.76 


1.11 


25.04 


Lcro_ 


_GLEAN_ 


.10016699 


10.02 


1.32 


9.72 


15.80 


3.79 


8.60 


12.79 


Lcro_ 


_GLEAN_ 


.10001253 


1.02 


1.48 


0.19 


0.00 


0.71 


0.64 


0.31 


Lcro_ 


_GLEAN_ 


.10002438 


1.10 


1.89 


0.17 


1.28 


2.14 


1.64 


1.30 


Lcro_ 


_GLEAN_ 


.10026693 


0.00 


0.10 


0.10 


0.15 


0.10 


0.44 


0.26 


Lcro_ 


_GLEAN_ 


.10024982 


0.00 


0.27 


0.00 


0.00 


0.18 


0.32 


0.31 


Lcro_ 


_GLEAN_ 


.10017000 


1.58 


1.33 


0.68 


2.85 


2.55 


0.83 


1.12 


Lcro_ 


_GLEAN_ 


.10019455 


8.16 


6.42 


6.41 


14.53 


13.91 


6.06 


4.61 


Lcro_ 


_GLEAN_ 


.10023286 


0.47 


0.94 


0.52 


1.53 


0.49 


0.74 


0.14 



Gene function 



3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 



Corticoliberin(0?F) 

Corticotropin-releasing factor receptor 1 
(CRFR1) 

Pro-opiomelanocortin (POMC) 

Corticotropin-releasing factor-binding protein 
(CRFBP) 

Endothelin-1 (ET-1) 

adrenomedullin {ADM) 

Interleukin-6 (IL-6) 

Tumor necrosis factor-cc (TNF-a) 

Suppressor of cytokine signaling 3 (SOCS-3) 

Suppressor of cytokine signaling 1 (SOCS-1) 

Interleukin-1 beta (IL-lfi) 
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1 Supplemental Table S24: Expression profiles of nerve-endocrine-metabolism 

2 network-related genes in the L. crocea brain under hypoxia 



Gene ID 




KrUn 


D„1 K 
Dlln 




D../.U 

Kron 


i> _ i ") i. 


KrZ^n 




Gene function 


HPT axis related genes 


















Lcro_GLEAN_ 


,10026052 


i nn 






1 A1 
-1 .4Z 


0 HI 

-z. / / 


-z.zu 


1 1 7 
-1.1/ 


Prothyroliberin type A (TRH) 


Lcro_GLEAN_ 


,10020966 


1.00 


-3.87 


-2.06 


-2.79 


-2.16 


-1.62 


-1.30 


Thyrotropin-releasing hormone receptor 
(TRHR) 


Lcro_GLEAN_ 


.10001346 


1.00 


-1.66 


-1.87 


-8.96 


-1.01 


1.21 


1.22 


Thyrotropinreleasing hormone degrading 
ectoenzyme (TRHDE) 


Lcro_GLEAN_ 


,10017386 


1.00 


0.00 


0.00 


-1.23 






-1.09 


Thyrotropin subunit beta (TSH) 


Lcro_GLEAN_ 


.10006930 


1.00 


1.09 


0.00 


-1.23 


-3.84 


-1.06 




Thyrotropin receptor (TSHR) 


Lcro_GLEAN_ 


.10005317 


1.00 


0.00 


-4.98 


-3.38 


-3.52 


-1.94 


1.42 


Thyrotropin receptor (TSHR) 


Lcro_GLEAN_ 


.10017014 


1.00 




1.23 


-1.46 


-1.31 


1.00 


1.07 


Thyroid hormone receptor alpha (TRa) 


Lcro_GLEAN_ 


.10004840 


1.00 


1.25 


1.37 


-2.52 


1.51 


1.61 




Thyroid hormone receptor beta (TRfi) 



Glycolysis-related genes 

Lcro_GLEAN_ 10003494 1.00 

Lcro_GLEAN_ 100 19601 1.00 

Lcro_GLEAN_ 10023 192 1.00 

Lcro_GLEAN_ 100 10629 1.00 

Lcro_GLEAN_10018335 1.00 

Lcro_GLEAN_10005291 1.00 

Lcro_GLEAN_10000869 1.00 

Lcro_GLEAN_ 10024929 1.00 

Lcro_GLEAN_10005020 1.00 

Lcro_GLEAN_10018122 1.00 

Lcro_GLEAN_10011584 1.00 



1.06 


1.24 


1.04 




1.23 


1.12 


Hexokinase (HK) 






-1.53 








glucose phosphate isomerase (GPI) 


1.13 


1.33 


1.65 


1.58 


-1.06 


1.11 


Phosphofructokinase (PFK) 














Fructose-bisphosphatealdolasef ALDOA) 




triose phosphate isomerase (TPI) 
glyceraldehyde-3-phosphate 
dehydrogenase (GAPDH) 
phosphoglyceratemutase (PGAM) 
Enolase (ENO) 
pyruvate kinase (PKM) 
pyruvate kinase (PKM) 
Lactate dehydrogenase (LDH) 



Genes in the TCA cycle 

Lcro_GLE AN_ 1 0026 1 34 
Lcro_GLE AN_ 1 000607 2 
Lcro_GLE AN_ 10025927 
Lcro_GLE AN_ 1 0022050 
Lcro_GLE AN_ 10011451 

Lcro_GLEAN_100235 12 

Lcro_GLE AN_ 10013616 
Lcro_GLE AN_ 10013518 
Lcro_GLEAN_ 10021353 



1.00 


1.09 


1.03 


-1.39 


-1.14 


-1.20 


-1.02 


Citrate synthase (CS) 


1.00 


-1.06 


1.12 


1.17 


-1.30 


-1.25 


-1.29 


PDC El beta 


1.00 


-1.12 


1.25 


1.06 


-1.19 


-1.32 


-1.01 


PDC El alpha 


1.00 


1.01 


1.40 


-1.65 


-1.06 


-1.08 


1.06 


Aconitase (ACO) 


1.00 


1.08 


1.19 


1.49 


-1.28 


-1.22 


-1.25 


isocitrate dehydrogenase (IDH) 


1.00 


-1.07 


1.32 


-1.25 


-1.01 


-1.11 


-1.05 


alpha-ketoglutarate dehydrogenase 
(a-KGDH) 


1.00 


-1.18 


-1.06 


1.48 


-1.33 


-1.39 


-1.23 


Succinyl-CoA synthetase (SCS) 


1.00 


-1.08 


1.07 


1.44 


-1.20 


-1.23 


-1.44 


Fumaratehydratase (FH) 


1.00 


1.02 


1.29 


1.38 


-1.11 


-1.18 


-1.14 


Malic/Malate dehydrogenase (MDH) 



Lcro_ 


_GLEAN_ 


,10013837 


1.00 


-1.18 


1.13 


-1.18 


1.18 


-1.03 


-1.12 


Hypoxia-inducible factor 1 -alpha (HIF-la) 


Lcro, 


_GLEAN_ 


.10018081 


1.00 


1.04 


1.57 


-1.06 


-1.23 


-1.03 


1.09 


Hypoxia-inducible factor 2-alpha (HIF-2a) 


Lcro_ 


_GLEAN_ 


.10009599 


1.00 


1.06 


1.31 


1.13 




1.07 


1.06 


Hypoxia-inducible factor 3-alpha (HIF-3a) 


Lcro_ 


_GLEAN_ 


.10011194 


1.00 


-1.33 


1.10 


-1.59 


1.11 


1.12 


1.09 


Hypoxia-inducible factor beta (HIF-{S) 



3 The red backgrounds represent that genes were up-regulated expression in L. crocea brain 

4 under hypoxia, and the green backgrounds represent genes were down-regulated expression. 

5 The numbers indicate fold change relative to control (The brain tissues harvested at 0 h were 

6 used as controls). 
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1 Supplemental Table S25: Expression profiles of protein synthesis-related genes in the L. 

2 crocea brain under hypoxia 



Gene ID 



BrOh Brlh Br3h Br6h Brl2h Br24h Br48h Gene function 



Lcro_ 




_ 1 UU 1 J /40 


i 
i 


fin 


Lcro_ 


m paw 




i 
i 


uu 


Lcro_ 


_GLEAN_ 


.10022699 


1 


00 


Lcro_ 


_GLEAN_ 


.10007379 


1 


00 


Lcro_ 


_GLEAN_ 


.10007811 


1 


00 


Lcro_ 


_GLEAN_ 


.10010382 


1 


00 


Lcro_ 


_GLEAN_ 


.10024989 


1 


00 


Lcro_ 


_GLEAN_ 


.10011811 


1 


00 


Lcro_ 


_GLEAN_ 


.10021549 


1 


00 


Lcro_ 


_GLEAN_ 


.10026525 


1 


00 


Lcro_ 


_GLEAN_ 


.10012448 


1 


00 


Lcro_ 


_GLEAN_ 


.10011310 


1 


00 


Lcro_ 


_GLEAN_ 


.10010804 


1 


00 


Lcro_ 


_GLEAN_ 


.10005848 


1 


00 
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1 The red backgrounds represent that genes were up-regulated expression in L. crocea brain 



2 under hypoxia, and the green backgrounds represent genes were down-regulated expression. 

3 The numbers indicate fold change relative to control (The brain tissues harvested at 0 h were 



4 used as controls). 
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1 Supplemental Table S26: Genes involved in mucin biosynthesis and mucus production 



Functional Group 


Gene Symbol 


Mucins 




Mucl, MucZ, Muc4, MucjB, MucjAL, MucIj, Muc19 


PDI 




PDIA1, PDIA3, PDIA4, PDIA6 






GALNT1, GALNT2, GALNT3, GALNT4, GALNT5, GALNT6, GALNT7, 




Initiating 


GALNT8, GALNT9, GALNT10, GALNT11, GALNT12, GALNT13, 
GALNT14 




Core 


C1GALT1, C1GALT2, GCNT1, GCNT3, GCNT4, GCNT7 


Glycosyl 
transferases 




B3GNT1, B3GNT2, B3GNT3, B3GNT5, B3GNT7, B3GALT1, 


Elongating 


B3GALT2, B3GALT4, B3GALT6, B4GALT1, B4GALT3, B4GALT4, 
B4GALT5, B4GALT6, B4GALT7 






<iTAT1 <\TAT7 <\JAT4C ITATfi <iJAT7A <\JAT7R <iJAT7n <iJAT7F 




Peripheral 


STAT7F STAT8A SIAT8F CHSTI CHST? CHSTfi CHST7 
CHST8, CHST10, CHST11, CHST12, CHST13, CHST15, GAL3ST1, 
FucT-1 


Receptors 




VDRA, VDRB, EGFR 






RAB1A, RAB2A, RAB3, RAB3A, RAB3B, RAB3C, RAB 3D, RAB4A, 






RAB4B, RAB5, RAB5A, RAB5B, RAB5C, RAB6A, RAB6B, RAB6C, 






RAB7A, RAB8A, RAB8B, RAB9A, RAB9B, RAB10, RAB11B, RAB12, 


RAB 




RAB 13, RAB 14, RAB 15, RAB 17, RAB18, RAB 19, RAB 20, RAB21, 
RAB22A, RAB23, RAB24, RAB25, RAB26, RAB27A, RAB27B, RAB28, 
RAB30, RAB31, RAB32, RAB33B, RAB34, RAB35, RAB36, RAB37, 
RAB38, RAB39A, RAB39B, RAB40C, RAB44 


SNARE 




STX1A, STX1B, STX4, STX5, STX6, STX7, STX8, STX10, STX11, 
STX12, STX16, STX17, STX18, STX19, STXBP1, STXBP5, VAMP2, 
VAMP4, VAMP5, VAMP7, VAMP8 


Ion channels, ion pumps and 


ATP1A1, ATP1A3, ATP1B3, SLC12A2, SLC12A7, SLC4A2, SLC4A4, 


transporters 




SLC4A5, CFTR, ATP2A1, ATP2A2, ATP2A3 


Regulation factors 


MARCKS, PKCa, PKCp, PKCi, PKC6, PKCQ, PKCS, PKCrj, PKCs 



Genes are abbreviated as Muc: mucin; C1GALT: Glycoprotein-N-acetylgalactosamine 
3-beta-galactosyltransferase; GALNT: Polypeptide N-acetylgalactosaminyltransferase; PDIA: 
protein disulfide isomerase family A; GCNT: Beta-l,3-galactosyl-0-glycosyl-glycoprotein 
beta- 1 ,6-N-acetylglucosaminyltransferase; B3GNT: UDP-GlcNAc:betaGal 

beta-l,3-N-acetylglucosaminyltransferase; B3GALT: beta-l,3-galactosyltransferase; SIAT: 
CMP-N-acetylneuraminate-beta- 1 ,4-galactoside alpha-2,3-sialyltransferase; GAL3ST1 : 
Galactosylceramidesulfotransferase; FucT-1: GDP-fucose transporter 1; VDR: Vitamin D3 
receptor; EGFR: Epidermal growth factor receptor; RAB: Ras-related protein; STX: Syntaxin; 
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1 STXBP1: Syntaxin-binding protein 1; VAMP: Vesicle-associated membrane protein; ATP1A: 

2 Sodium/potassium-transporting ATPase subunit alpha; ATP1B: 

3 Sodium/potassium-transporting ATPase subunit beta; SLC: Solute carrier family; CFTR: 

4 Cystic fibrosis transmembrane conductance regulator; ATP2A: Sarcoplasmic/endoplasmic 

5 reticulum calcium ATPase 1. MARCKS: Myristoylated alanine-rich C-kinase substrate; PRC: 

6 Protein kinase C. 
7 

8 

9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
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1 Supplemental Table S27: Summury of MS/MS spectra and proteins identified in the L. 

2 crocea skin mucus under air exposure 



Classification 


Number 


Percentage (%) 


Total Spectra 


636,059 


100.00 


Identified Spectra 


171,349 


26.94 


Identified Peptides 


25,026 


3.93 


Identified Proteins 


4,489 


17.67 


Identified Ptoteins 
(Unique peptides > 2) 


3,209 


12.63 



3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
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1 Supplemental Table S28: Antioxidant proteins identified in the L. crocea mucus 

2 proteome 



Protein name 



Cytochrome c oxidase subunit 4 isoform 1 , mitochondrial 

Cytochrome c oxidase subunit 5A, mitochondrial 

Cytochrome c oxidase subunit 5A, mitochondrial 

Cytochrome c oxidase subunit 6B 1 

Cytochrome c oxidase subunit 6A, mitochondrial 

Cytochrome c oxidase subunit 7A-related protein, 

mitochondrial 

Phenylalanine-4-hydroxylase 

Ubiquinone biosynthesis monooxygenase COQ6 

Prostaglandin G/H synthase 1 

Phospholipid hydroperoxide glutathione peroxidase, 
mitochondrial 
Eosinophil peroxidase 
Eosinophil peroxidase 

Phosphatidylinositol-4-phosphate 5-kinase type-1 gamma 

Glutathione peroxidase 1 

Glutathione peroxidase 1 

Glutathione peroxidase 7 

Thioredoxin-like protein 1 

Thioredoxin 

Thioredoxin domain-containing protein 5 
Thioredoxin, mitochondrial 
Superoxide dismutase [Cu-Zn] 

NADH dehydrogenase iron-sulfur protein 2, mitochondrial 

Prostamide/prostaglandin F synthase 

Peroxiredoxin-6 

Peroxiredoxin 

Peroxiredoxin 

Thioredoxin-dependent peroxide reductase, mitochondrial 
Peroxiredoxin-4 

L-2-hydroxyglutarate dehydrogenase, mitochondrial 
L-lactate dehydrogenase B chain 
L-lactate dehydrogenase A chain 
Peroxiredoxin-5, mitochondrial 
Apoptosis-inducing factor 1, mitochondrial 
Extracellular superoxide dismutase [Cu-Zn] 
Amine oxidase [flavin-containing] A 
NADH-cytochrome b5 reductase 3 

Alpha-aminoadipicsemialdehyde synthase, mitochondrial 
Alcohol dehydrogenase [NADP(+)] B 
Aflatoxin B 1 aldehyde reductase member 2 
Uncharacterized oxidoreductase MSMEG_2408 
Aldo-keto reductase family 1 member BIO 
Aldo-keto reductase family 1 member BIO 
Aldo-keto reductase family 1 member BIO 
Sulfide:quinoneoxidoreductase, mitochondrial 
UDP-glucose 6-dehydrogenase 

Glyceraldehyde 3-phosphate dehydrogenase, testis-specific 
Glyceraldehyde 3-phosphate dehydrogenase, testis-specific 
Glyceraldehyde-3-phosphate dehydrogenase 
Prenylcysteine oxidase 



Epidermis-type lipoxygenase 3 
Epidermis-type lipoxygenase 3 
Epidermis-type lipoxygenase 3 
Prolyl 3-hydroxylase 1 
Ceruloplasmin 

NADH dehydrogenase [ubiquinone] flavoprotein 

mitochondrial 

L-amino-acid oxidase 

Extended synaptotagmin-2-A 

Zinc finger protein AEBP2 

NADH dehydrogenase flavoprotein 1, mitochondrial 

Glutathione reductase, mitochondrial 

NADH dehydrogenase iron-sulfur protein 4, mitochondrial 

EROl-like protein alpha 

EROl-like protein beta 

Malate dehydrogenase, cytoplasmic 

Malate dehydrogenase, cytoplasmic 

Malate dehydrogenase, mitochondrial 

Uricase 

Ubiquitin-conjugating enzyme E2 variant 3 

Glutamate dehydrogenase 1, mitochondrial 

Glutamate dehydrogenase, mitochondrial 

Hydroxyacyl-coenzyme A dehydrogenase, mitochondrial 

Lambda-crystallin homolog 

Protein disulfide-isomerase 

Protein disulfide-isomerase A3 

Protein disulfide-isomerase 

Protein disulfide-isomerase A5 

Protein disulfide-isomerase A4 

Protein disulfide-isomerase A6 

Protein disulfide-isomerase TMX3 

Mitochondrial sodium/hydrogen exchanger 9B2 

Cytochrome b-cl complex subunit Rieske, mitochondrial 

NADH dehydrogenase iron-sulfur protein 7, mitochondrial 

NA 

Dihydropteridine reductase 
Hydroxysteroid dehydrogenase-like protein 2 
Peroxisomal multifunctional enzyme type 2 
Carbonyl reductase [NADPH] 1 
15-hydroxyprostaglandin dehydrogenase [NAD(+)] 
3-hydroxyacyl-CoA dehydrogenase type-2 
3-oxoacyl-[acyl-carrier-protein] reductase FabG 
Dehydrogenase/reductase SDR family member 12 
Dehydrogenase/reductase SDR family member 1 1 
C-factor 

2,4-dienoyl-CoA reductase, mitochondrial 
Dehydrogenase/reductase SDR family member 1 1 
Estradiol 17-beta-dehydrogenase 12-A 
Estradiol 17-beta-dehydrogenase 12-B 
Retinol dehydrogenase 12 
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Mitochondrial peptide methionine sulfoxide reductase 
Mitochondrial peptide methionine sulfoxide reductase 
Isocitrate dehydrog enase [NAD] subunit gamma 1 , 
mitochondrial 

Cytosolic 10-formyltetrahydrofolate dehydrogenase 
Isocitrate dehydrogenase [NADP] cytoplasmic 
Isocitrate dehydrogenase [NADP], mitochondrial 
Isocitrate dehydrogenase [NADP], mitochondrial 
Epidermis-type lipoxygenase 3 

Isocitrate dehydrogenase [NAD] subunit alpha, mitochondrial 
Isocitrate dehydrogenase [NAD] subunit beta, mitochondrial 
Dehydrogenase/reductase SDR family member 4 (Fragment) 
Branched-chain-amino-acid aminotransferase, cytosolic 
Calcium-transporting ATPase type 2C member 1 
Trifunctional enzyme subunit alpha, mitochondrial 
Alkyldihydroxyacetonephosphate synthase, peroxisomal 
Prolyl 4-hydroxylase subunit alpha- 1 

Very long-chain specific acyl-CoA dehydrogenase, 
mitochondrial 

Long-chain specific acyl-CoA dehydrogenase, mitochondrial 

Glutaryl-CoA dehydrogenase, mitochondrial 

Isovaleryl-CoA dehydrogenase, mitochondrial 

Short-chain specific acyl-CoA dehydrogenase, mitochondrial 

Short/branched chain specific acyl-CoA dehydrogenase, 

mitochondrial 

Aldehyde dehydrogenase family 3 member B 1 

Fatty aldehyde dehydrogenase 

Fatty aldehyde dehydrogenase 

Sarcosine dehydrogenase, mitochondrial 

Dihydrolipoyl dehydrogenase, mitochondrial 

NADH-cytochrome b5 reductase 2 

Delta- 1 -pyrroline-5-carboxylate synthase 

Glycerol-3-phosphate dehydrogenase [NAD(+)], cytoplasmic 

Glycerol-3-phosphate dehydrogenase [NAD(+)], cytoplasmic 

Glycerol-3-phosphate dehydrogenase [NAD(+)], cytoplasmic 

Glycerol-3-phosphate dehydrogenase, mitochondrial 

NADP-dependent malic enzyme, mitochondrial 

NAD-dependent malic enzyme, mitochondrial 

Probable 2-oxoglutarate dehydrogenase El component 

DHKTD1, mitochondrial 

Methylmalonate-semialdehyde dehydrogenase, mitochondrial 
Methylmalonate-semialdehyde dehydrogenase, mitochondrial 
2-oxoglutarate dehydrogenase, mitochondrial 

2- oxoglutarate dehydrogenase-like, mitochondrial 
NADP-dependent malic enzyme 

3- hydroxyisobutyrate dehydrogenase, mitochondrial 
3-hydroxyisobutyrate dehydrogenase, mitochondrial 
Arachidonate 15-lipoxygenase B 
Epidermis-type lipoxygenase 3 



Peroxisomal 2,4-dienoyl-CoA reductase 
Peroxisomal trans-2-enoyl-CoA reductase 

Pyruvate dehydrogenase El component subunit alpha, somatic 
form, mitochondrial 
Retinol dehydrogenase 3 

Dehydrogenase/reductase SDR family member 13 
Malate dehydrogenase 

Alpha-aminoadipicsemialdehyde dehydrogenase 

Aldehyde dehydrogenase, mitochondrial 

Aldehyde dehydrogenase family 9 member A 1 

Aldehyde dehydrogenase, mitochondrial 

Delta- l-pyrroline-5-carboxylate dehydrogenase, mitochondrial 

Aldehyde dehydrogenase family 9 member A 1-B 

Aldehyde dehydrogenase family 16 member A 1 

Retinal dehydrogenase 2 

C-terminal-binding protein 1 

Glyoxylate reductase/hydroxypyruvate reductase 

C-terminal-binding protein 1 

Medium-chain specific acyl-CoA dehydrogenase, mitochondrial 
2-oxoisovalerate dehydrogenase subunit alpha, mitochondrial 
Alcohol dehydrogenase class-3 chain L 
Quinone oxidoreductase 

Synaptic vesicle membrane protein VAT-1 homolog 

Synaptic vesicle membrane protein VAT-1 homolog 
Sorbitol dehydrogenase 
Prostaglandin reductase 1 

Zinc-binding alcohol dehydrogenase domain-containing protein 2 

Alcohol dehydrogenase 1 

Quinone oxidoreductase-like protein 1 

Alcohol dehydrogenase class-3 

NAD(P) transhydrogenase, mitochondrial 

Glutaredoxin-1 

Glutaredoxin-2, mitochondrial 
Prostaglandin E synthase 2 
Glutaredoxin 3 

Glutaredoxin-related protein 5, mitochondrial 
NADH-ubiquinone oxidoreductase 75 kDa subunit, 
mitochondrial 

NAD(P)H dehydrogenase [quinone] 1 

Succinate dehydrogenase flavoprotein subunit, mitochondrial 

Putative oxidoreductase GLYR1 

D-3-phosphoglycerate dehydrogenase 

Prolyl 4-hydroxylase subunit alpha- 1 

Prolyl 4-hydroxylase subunit alpha-2 

Glutathione S -transferase kappa 1 

Thioredoxin reductase 3 (Fragment) 

Deleted in malignant brain tumors 1 protein 
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1 Supplemental Table S29: Oxygen binding-related proteins identified in the L. crocea 

2 mucus proteome 





Gene ID 




Protein name 


Lcro_ 


_GLEAN_ 


.10000061 


Hemoglobin subunit alpha- 1 


Lcro_ 


GLEAN. 


.10005683 


Hemoglobin subunit beta 


Lcro_ 


.GLEAN. 


.10008557 


Hemoglobin subunit beta 


Lcro_ 


GLEAN. 


.10008556 


Hemoglobin subunit alpha-A 


Lcro_ 


GLEAN. 


.10011954 


Hemoglobin subunit beta-2 


Lcro_ 


GLEAN. 


.10005682 


Hemoglobin subunit alpha 


Lcro_ 


.GLEAN. 


.10011722 


Cytoglobin-1 


Lcro_ 


.GLEAN. 


.10011953 


Hemoglobin subunit alpha-D 



3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
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1 Supplemental Table S30: Immunity-related proteins identified in the L. crocea mucus 

2 proteome 



Gene ID Protein name 



Lcro. 


_GLEAN_ 


1 f\f \ 1 f\ C A A 

.10010544 


Cathepsin B 


Lcro. 


_GLEAN_ 


.10000247 


Cathepsin D 


Lcro. 


_GLEAN_ 


.10019933 


Cathepsin D 


Lcro. 


_GLEAN_ 


.10006830 


Cathepsin F 


Lcro. 


_GLEAN_ 


.10007429 


Cathepsin K 


Lcro. 


_GLEAN_ 


.10010056 


Cathepsin L 


Lcro. 


.GLEAN _ 


1 AAAT1A /I 

.10007394 


Cathepsin LI 


Lcro. 


.GLEAN _ 


_1 0007428 


Cathepsin S 


Lcro. 


.(jbbAN. 


1 AAA 1 OOI 

_100012ol 


Cathepsin Z, 


Lcro. 


.GLEAN. 


_ 100 12749 


C-C motif chemokme 25 


Lcro. 


/~"*T U A "NT 

_CjbbAJN_ 


1 AA1 

_10012/20 


C-C motif chemokine 4 


Lcro. 


/~*T U A XT 

_CjbbAJN_ 


1 AAAAAO C 
_1 0000920 


CD 1 66 antigen homolog A 


Lcro. 


f~*T U A XT 

_ObbAJN_ 


1 AAOO/^A/1 

_ 10022694 


CD2-associated protein 


Lcro. 


_CjbbAJN_ 


1 AAO 1 1 AO 

_1 002 1102 


CD59 glycoprotein 


Lcro. 


/^T CAM 

_CjbbAJN_ 


1 AAOAAC7 

_1 002090 / 


CD63 antigen 


Lcro. 


_CjbbAJN_ 


1 AAAT/^O/; 

_ 1000/ 636 


CD 81 antigen 


Lcro. 


/^T CAM 

_CjbbAJN_ 


1 A A 1 A1Q1 

_10014353 


CD9 antigen 


Lcro. 


.GLEAN _ 


1 AAO OCC/C 

.10022556 


Complement Clq and tumor necrosis factor-related protein 9 


Lcro. 


.GLEAN _ 


1 A A 1 /I O A O 

.10014898 


Complement Clq subcomponent subunit C 


Lcro. 


JjbbAJN. 


.IUUUj /4/ 


Complement Clq tumor necrosis factor-related protein 4 


Lcro. 


.vjbbAJN. 


_ll)lA)863 / 


Complement Clq-like protein 4 


Lcro. 


JjbbAJN. 


1 aa 1 one 


Complement Clr-A subcomponent 


Lcro. 


.vjbbAJN. 


1 aa 1 on^ 


Complement C 1 s subcomponent 


Lcro. 


.(jbbAN. 


1 AAAC 1 /I o 

.10005143 


Complement C3 (Fragment) 


Lcro. 


.(jbbAN. 


1 AAA A CCA 

.10004550 


Complement C3 (Fragment) 


Lcro. 


.(jbbAN. 


1 AAA yicri 

.10004551 


Complement C3 (Fragment) 


Lcro. 


/■ 1 I I ' \ XT 

_(jbbAN_ 


1 AAA1 cm 

.10001592 


Complement C3 (Fragment) 


Lcro. 


_GLEAN_ 


.10016221 


Complement C3 (Fragment) 


Lcro. 


GLEAN. 


.10013690 


Complement C4-B 


Lcro. 


.GLEAN. 


.10005696 


Complement component 1 Q subcomponent-binding protein, mitochondrial 


Lcro. 


GLEAN. 


.10010837 


Complement component C6 


Lcro. 


.GLEAN, 


.10010838 


Complement component C7 


Lcro. 


.GLEAN. 


.10012074 


Complement component C7 


Lcro. 


.GLEAN. 


.10004265 


Complement component C8 alpha chain 


Lcro. 


.GLEAN. 


.10004266 


Complement component C8 beta chain 


Lcro. 


.GLEAN. 


.10024705 


Complement component C8 gamma chain 


Lcro. 


GLEAN. 


.10023871 


Complement component C9 


Lcro. 


.GLEAN. 


.10025891 


Complement factor B 
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Lcro_ 


.GLEAN. 


.10012700 


Complement factor D 


Lcro_ 


.GLEAN. 


.10004839 


Complement factor H 


Lcro_ 


.GLEAN. 


.10004869 


Complement factor H 


Lcro_ 


GLEAN. 


.10002965 


Heat shock 70 kDa protein 14 


Lcro_ 


GLEAN. 


.10017340 


Heat shock 70 kDa protein 4 


Lcro_ 


.GLEAN. 


.10015146 


Heat shock 70 kDa protein 4 


Lcro_ 


GLEAN. 


.10008388 


Heat shock cognate 70 kDa protein 


Lcro_ 


GLEAN. 


.10008390 


Heat shock cognate 71 kDa protein 


Lcro_ 


GLEAN. 


.10007537 


Heat shock protein 75 kDa, mitochondrial 


Lcro_ 


GLEAN. 


.10022324 


Heat shock protein HSP 90-alpha 


Lcro_ 


GLEAN. 


.10015783 


Heat shock protein HSP 90-beta 


Lcro_ 


GLEAN. 


.10006147 


Ig heavy chain V region 5-84 


Lcro_ 


.GLEAN. 


.10006137 


Ig heavy chain V region 5A 


Lcro_ 


GLEAN. 


.10006149 


Ig heavy chain V-III region CAM 


Lcro_ 


GLEAN. 


.10006141 


Ig heavy chain V-III region HIL 


Lcro_ 


GLEAN. 


.10000739 


Ig kappa chain C region 


Lcro_ 


.GLEAN. 


.10012845 


Ig kappa chain V-III region MOPC 63 


Lcro_ 


GLEAN. 


.10000738 


Ig kappa chain V-IV region JI 


Lcro_ 


GLEAN. 


.10025384 


Ig lambda chain V-III region LOI 


Lcro_ 


GLEAN. 


.10000744 


Ig lambda-6 chain C region 


Lcro_ 


GLEAN. 


.10025386 


Ig lambda-6 chain C region 


Lcro_ 


GLEAN. 


.10006127 


Ig mu chain C region membrane-bound form 


Lcro_ 


GLEAN. 


.10004635 


Immunoglobulin lambda-like polypeptide 5 


Lcro_ 


GLEAN. 


.10021327 


Immunoglobulin superfamily member 3 


Lcro_ 


.GLEAN. 


.10008198 


Lysozyme C 


Lcro_ 


GLEAN. 


.10008196 


Lysozyme C 


Lcro_ 


GLEAN. 


.10016863 


Lysozyme g 


Lcro_ 


GLEAN. 


.10002658 


Mannose-specific lectin 


Lcro_ 


GLEAN. 


.10019420 


Beta-galactoside-binding lectin 


Lcro_ 


GLEAN. 


.10018344 


Collectin-12 


Lcro_ 


GLEAN. 


.10005274 


C-type lectin domain family 4 member E 


Lcro_ 


GLEAN. 


.10024969 


Epiplakin 


Lcro_ 


.GLEAN. 


.10020509 


Fish-egg lectin 


Lcro_ 


GLEAN. 


.10000104 


Fish-egg lectin 


Lcro_ 


.GLEAN. 


.10000262 


Fucolectin-1 


Lcro_ 


GLEAN. 


.10008595 


Galectin-3 


Lcro_ 


GLEAN. 


.10025411 


Galectin-3 -binding protein A 


Lcro_ 


GLEAN. 


.10008735 


Galectin-8 


Lcro_ 


GLEAN. 


.10002304 


Galectin-9 


Lcro_ 


GLEAN. 


.10015893 


L-rhamnose-binding lectin CSL2 


Lcro_ 


.GLEAN. 


.10018984 


Malectin 


Lcro_ 


GLEAN. 


.10026183 


N-acetylaspartatesynthetase 


Lcro_ 


.GLEAN. 


.10025269 


Plasma kallikrein 


Lcro_ 


GLEAN. 


.10023966 


Plectin 
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1 Supplemental Table S31: Number of ion binding-related proteins identified in the L. 

2 crocea mucus proteome 



GO ID 


GO Term 


Number 


GO:0000287 


magnesium ion binding 


24 


GO:0005509 


calcium ion binding 


102 


GO:0006826 


iron ion transport 


4 


GO:0005506 


iron ion binding 


28 


GO:0005507 


copper ion binding 


2 


GO:0008270 


zinc ion binding 


159 


GO:0006820 


anion transport 


8 


GO:0006812 


cation transport 


15 


GO:0030001 


metal ion transport 


3 


GO:0046872 


metal ion binding 


24 


GO:0030145 


manganese ion binding 


7 



3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
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1 Supplemental notes 

2 Supplemental note 1: Organism background 

3 The large yellow croaker Larimichthys crocea (L. crocea) is a temperate-water migratory 

4 fish belonging to order Perciformes and family Sciaenidae. Its wild population is mainly 

5 distributed in the southern Yellow Sea, East China Sea, and northern South China Sea. L. 

6 crocea has a flat and long body with yellow or golden-yellow skin. It feeds on various 

7 groups of smaller fish and also on marine crustaceans such as shrimp and crab. L. crocea is 

8 one of the most economically important marine fish in China and East Asian countries due 

9 to its rich nutrients and trace elements, especially selenium. In China, the annual yield from 

10 L. crocea aquaculture exceeds that of any other net-cage-farmed marine fish species (Su 

11 2004; Mu et al. 2010). L. crocea also exhibits peculiar behavioral and physiological 

12 characteristics, such as loud sound production, high sensitivity to sound, and well-developed 

13 photosensitive and olfactory systems (Su 2004; Zhou et al. 2011). Most importantly, L. 

14 crocea is especially sensitive to various environmental stresses, such as hypoxia and air 

15 exposure. For example, the response of its brain to hypoxia is quick and robust, and a large 

16 amount of mucus is secreted from its skin when it is exposed to air (Gu and Xu 2011). These 

17 traits may render L. crocea a good model for investigating the response mechanisms to 

18 environmental stress. Several studies have reported transcriptomic and proteomic responses 

19 of L. crocea to pathogenic infections or immune stimuli (Mu et al. 2010; Yu et al. 2010; Mu 

20 et al. 2014). The effect of hypoxia on the blood physiology of L. crocea has been evaluated 

21 (Gu and Xu 201 1). However, little is known about the molecular response mechanisms of L. 

22 crocea against environmental stress. Additionally, L. crocea genome consists of 48 
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1 telocentromeric chromosomes, and has a high level of heterozygosity as estimated by k-mer 

2 analyses (Supplemental Fig. S1-S2). 

3 Supplemental note 2: Genome sequence and assembly 

4 2.1. Sample preparation and sequencing 

5 The studies were carried out in strict accordance with the Regulations of the Administration 

6 of Affairs Concerning Experimental Animals established by the Fujian Provincial Department 

7 of Science and Technology. Animal experiments were approved by the Animal Care and Use 

8 Committee of the Third Institute of Oceanography, State Oceanic Administration. All surgery 

9 was performed under Tricaine-S anesthesia, and all efforts were made to minimize suffering. 

10 Wild individuals of L. crocea were collected from the Sanduao Sea area, Ningde, Fujian, 

11 China. Genomic DNA was isolated from the blood of a female fish for BAC library 

12 construction using standard molecular biology techniques. BAC-to-BAC strategy and 

13 whole-genome shotgun (WGS) methods were combined to obtain a high-quality assembly 

14 (Supplemental Fig. SI). BAC sequences were merged to build contig sequences, and 

15 whole-genome shotgun sequences were used to orient the contigs to scaffold sequences and 

16 fill gaps. For each BAC, a library was built with an insert size of 500 bp and sequenced by 

17 using the Highseq 2000 system in BGI (Beijing Genomics Institute, Shenzhen, China). For 

18 the whole-genome shotgun strategy, a series of libraries with insert sizes from 170 bp to 40 

19 kbp were built. To facilitate the genome analysis, quality control was performed by trimming 

20 low-quality reads and bases, and removing contaminated and duplicated reads to obtain the 

21 clean data. The following criteria were used to identify the reads that should be removed: 

22 1. Reads with > 10% unidentified nucleotides. 
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1 2. Reads from short-insert-size libraries having more than 65% bases with Q20<7, and 

2 reads from large-insert-size libraries that contained more than 80% bases with Q20<7. 

3 3. Reads with more than 10 bp aligned to the adapter sequence, allowing <2 bp mismatches. 

4 4. Small-insert-size paired-end reads that overlapped >10 bp with the corresponding paired 

5 end. 

6 5. Read 1 and read 2 of two paired-end reads that were completely identical (and thus 

7 considered to be the products of PCR duplication). 

8 We built 42,528 BACs in 443-well plates, producing a total of 475 Gbp. For each BAC, the 

9 average coverage was 63.63x, which was sufficient for a high-quality BAC assembly. 

10 2.2. Genome property estimation by k-mer 

11 A preliminary survey was performed to gain insights into the properties of the L. crocea 

12 genome by k-mer analyses. The 17-mer analysis suggested that the L. crocea genome was 

13 691 Mb (Supplemental Fig. S2). The k-mer distribution was bimodal, and the k-mer depth 

14 of the first peak (22) was half that of the second (44), implying that the genome of L. crocea 

15 was rich in heterogeneous sites, which is a serious obstacle for short-read assemblies. 

16 2.3. Genome assembly by a BAC-to-BAC strategy 

17 The BAC-to-BAC strategy resolves heterogeneous sites and repetitive elements in a BAC, 

18 which will improve the assembly greatly. In the L. crocea genome project, each BAC was 

19 assembled by SOAPdenovo with different K, and then the longest N50 size was taken as an 

20 optimal assembly. For all BACs, the sequences, great than 500 bp, were added to Rabbit 

21 ( ftp://ftp.genomics.org.cn/pub/Plutellaxvlostella/Rabbit linux-2.6.18-194.blc.tar.gz ). Rabbit 

22 uses BLAT to find overlaps between different BACs, and then merges and links different 
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1 BACs to obtain a consensus assembly. The configuration file for Rabbit was as follows: 

2 [sequence] BAC.fa 

3 [recursive] 1 

4 [genome jsize] 340000000 

5 [cpu] 30 

6 [minjen] 2000 
1 [trim_end] 40 

8 [devide_size] 100000000 

9 [find_queue] bc.q test 

10 [overlap _queue] bc.q test 

11 [ovl] 5001 4000 0.9 0.9 ov_5001 

12 [ovl] 3001 2500 0.9 0.9 ov_3001 

13 [ovl] 100 90 0.9 0.6 ov_301 

14 Then, Jellyfish was used to determine the frequency of K-mers for the ~50x WGS reads 

15 from single individuals, using the following commands: 

16 gunzip -c wgs.fq.gz I jellyfish count -m 17 -o BAC --timing BAC.time -s 

17 1073741824 -t 32 -c 8 -C IdevlfdIO l>BAC.log 2>BAC.error 

18 jellyfish merge -v -o BAC.jfBAC_* l»BAC.log 2»BAC.error 

19 jellyfish dump -c -t -o BAC.dumpBAC.jf l»BAC.log 2»BAC.error 

20 jellyfish stats -o BAC.statsBAC.jf 2»BAC.error 

21 jellyfish histo -t 32 BAC.jf I sed 'si I lg' >BAC.histo 

22 After obtained the k-mer occurrence frequency table "BAC. dump", the redundant 
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1 sequences were removed by using the following command: 

2 Tool/Duplicate/TrimDupBAC.dumn Uscaf.fa clean 0.3 

3 After that, scaffolds were assigned by SSPACE-V2.1 (Boetzer et al. 2011). The paired-end 

4 information from large-insert-size (2 kbp, 5 kbp, 10 kbp, 20 kbp, and 40 kbp) libraries was 

5 used to orient the contigs to the scaffold. Finally, gaps in the assembly were filled by 

6 Gapcloser (Luo et al. 2012). The contig N50 size was 63.11 kbp and the scaffold N50 size 

7 was 1.03 Mbp (Supplemental Table S5). 

8 2.4. Genome assembly evaluation 

9 To examine the integrity of the assembly, the clean reads from 170-bp and 500-bp libraries 

10 were aligned to the assembly by using BWA(Li and Durbin 2009). In total, 95.63% of reads 

11 were aligned to the L. crocea assembly. The single -base depth distribution was calculated 

12 (Supplemental Fig. S3), and a peak was observed at half of the value of the expected peak of 

13 52x, suggesting the reluctance of the assemblies. Furthermore, the scaffold sequences with a 

14 depth of less than 26 x were checked. However, those sequences totaled 3.4 Mb and there 

15 were 102 genes (0.04% of total genes) in those scaffolds. The transcript sequences from 

16 transcriptomes of each eleven mixed tissues (Supplemental note 5) were aligned to the L. 

17 crocea assembly by BLAT (Kent 2002) with default parameters to examine the completeness 

18 of expression region of genome. 

19 Supplemental note 3: Evolutionary analysis 

20 3.1. Gene family analysis 

21 To detect variations in the L. crocea genome, we chose nine species (Larimichthys crocea, 

22 Gasterosteus aculeatus, Takifugu rubripes, Tetraodon nigroviridis, Oryzias latipes, Gadus 
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1 morhua, Danio rerio, Gallus gallus, and Homo sapiens) for comparison. Proteins that were 

2 greater than 50 amino acids in size were aligned by BLAST (Altschul et al. 1990) (-p blastp-e 

3 le-7 ), and Treefam (Ruan et al. 2008) was used to construct gene families. A total of 19,283 

4 gene families were estimated in L. crocea and other eight species. The 25,387 genes observed 

5 in L. crocea genome belonged to 14,698 gene families, of which 215 gene families 

6 (containing 521 genes) were specific to L. crocea (Supplemental Table S13). 

7 3.2. Phylogeny and divergence time estimation 

8 To determine the phylogeny of L. crocea, 2,257 single-copy genes from the gene family 

9 analysis were aligned by using MUSCLE (Edgar 2004) and the alignments were concatenated 

10 as a single data set. A total of 5,319,909 nucleic acid sites were obtained from the alignment. 

11 To reduce the error topology of phylogeny by alignment inaccuracies, Gblock (Castresana 

12 2000) (codon model) was used to remove unreliably aligned sites and gaps in the alignment. 

13 This method produced 3,180,303 reliable coding sites for phylogeny analysis and 87,943 

14 4-fold degenerate sites (neutral substitution rate per year) to estimate divergence time. 

15 To construct a phylogenetic tree of the nine vertebrate species, 3,180,303 nucleic acid sites 

16 were added to TreeBeST and PhyML (Yang 1997). A total of 87,943 4-fold degenerate sites 

17 were used to estimate divergence time by using the mcmctree in the PAML 3.0 package 

18 (Yang 1997). 

19 3.3. Gene family expansion and contraction 

20 Gene family expansion and contraction analyses were performed by CAFE (De Bie et al. 

21 2006). A random birth and death model were used in CAFE to study gene gain and loss in 

22 gene families across a user-specified phylogenetic tree. A global parameter X (lambda), which 
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1 described both gene birth (X) and death (|J.=-^) rate across all branches in the tree for all gene 

2 families, was estimated by using the maximum likelihood method. A conditional P-value was 

3 calculated for each gene family and families with conditional P-values less than 0.01 were 

4 considered to have a significantly accelerated rate of expansion and contraction 

5 (Supplemental Tables S14-S15). 

6 3.4. Positively selected genes in L. crocea 

7 To determine gene orthology, all protein sequences from six species (Larimichthys crocea, 

8 Gasterosteus aculeatus, Danio rerio, Oryzias latipes, Takifugu rubripes, and Tetraodon 

9 nigroviridis) were aligned by BLAST (Mount 2007) (-p blastp-e le-5-m 8), and the 

10 alignments were linked by solar to resolve the separation by local alignment. Alignments with 

11 identity >50 and alignment coverage >50% and reciprocal best hits were defined as 

12 orthologous between L. crocea and other species. Then, the orthology of six species was 

13 determined according to the L. crocea orthology. A total of 2,346 genes were identified as 

14 orthologous genes in six species. 

15 Inference of positive selection generally takes the multiple sequence alignment input for 

16 granted, regardless of uncertainties in the alignment. Because alignment error is an important 

17 concern in molecular data analyses, alignments were made by using PRANK (Loytynoja and 

18 Goldman 2010) in the GUIDANCE (Penn et al. 2010) pipeline. GUIDANCE filters and 

19 masks unreliably aligned positions in sequence alignments before subsequent analysis. The 

20 codon sequences (nucleotide sequences coding for proteins) were aligned by using PRANK, 

21 the columns were removed with low GUIDANCE scores, and the remaining alignment was 

22 used to infer positive selection based on the branch-site dN/dS test by codeml in the PAML 
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1 3.0 package (Yang 1997). 

2 Supplemental note 4: Genomic basis for behavioral and physiological characteristics 

3 4.1. Photosensitivity 

4 We used the protein sequences encoded by vision-related genes in zebrafish to compare the 

5 genomes of L. crocea and five other teleosts using BLAST, and found that the copy numbers 

6 of some visual genes were expanded in the L. crocea genome (Supplemental Table S17). 

7 Crystallins are the major structural components and necessary for maintaining transparency 

8 in the ocular lens. They are important for survival and regeneration of retinal ganglion cells 

9 (Piri et al. 2013). In our current study, several crystallin genes, such as crygm2b, crybal, and 

10 crybb3 were expanded, with more copy numbers in L. crocea than in other sequenced teleosts. 

11 The major and most abundant protein present in the ocular lens of most teleosts is 

12 gamma-cry stallin (Pan et al. 1995), among which crygamlb in the L. crocea genome was 

13 remarkably expanded compared with other teleosts (12 vs. 3-8 copies; Supplemental Fig. 

14 S6). The specific expansion of these crystallin genes may be helpful for improving 

15 photosensitivity by increasing lens transparency, thereby enabling the fish to easily find food 

16 and avoid predation underwater. Retinoid dehydrogenases/reductases (RDHs) can reduce the 

17 reactive aldehyde from photo- activated rhodopsin by converting all-trans-retinal to 

18 all-trans-retinol. Rhodopsin is expressed in rod cells that are used for dim light. After light 

19 exposure, the RDH12 in inner segments of photoreceptors can reduce the retinal leak and 

20 toxic aldehydes (Chen et al. 2012). Therefore, the expansion of RDH12 gene (rdhl2) in L. 

21 crocea would be useful for protecting rhodopsin and retina. 

22 4.2. Olfaction. 
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1 Chemosensation is essential for survival. The chemical senses are responsible for detecting 

2 molecules of immense chemical variety, which requires a massive repertoire of receptors to 

3 match the diversity of chemical structures (Mombaerts 2004). The olfactory/odorant receptors 

4 (ORs) are the most important chemosensory receptors in detecting environmental chemicals, 

5 and they detect a wide range of compounds (Zhou et al. 2011). L. crocea possesses powerful 

6 olfactory abilities, as the numerous cilia and microvilli are widely distributed on the olfactory 

7 epithelia. The odorant receptor repertoire of vertebrate species, including teleost fishes, has 

8 been extensively reported (Alioto and Ngai 2005; Niimura 2009b; Niimura 2009a). Here, we 

9 tried to characterize the OR-like genes in the L. crocea genome. 

10 We downloaded the OR-like genes from NCBI and used these for homology searches 

11 against the genomes of large yellow croaker, atlanticcod, zebrafish (Zv9), medaka, 

12 stickleback, Janpanese pufferfish and green spotted pufferfish using TBLASTN with 

13 E-value<lE-5. We chose alignments with coverage >30% and identity >30% and extended 5 

14 kbp on both ends of every alignment. GeneWise2.2.0 was employed to predict gene 

15 structures and open-reading frames (ORF). A sequence was discarded if there was at least one 

16 premature stop codon or frame shift. The remaining predicted sequences were re-checked by 

17 BLAST searches against the Swissprot database (2014) (UniProt Consortium, 2014). Only 

18 those proteins that gave an 'Olfactory receptor' hit and with a length greater than 270 amino 

19 acids were retained and defined as functional OR-like genes. Supplemental Table S18 shows 

20 the number and classification of identified functional OR-like genes of seven teleost genomes. 

21 The zebrafish has the largest number of functional OR-like genes (-152), possibly because of 

22 one more round of genome duplication in zebrafish, and the green spotted pufferfish has the 
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1 least (-44), which agrees with previous studies (Niimura 2009b; Zhou et al. 2011). 

2 We identified 112 functional OR-like genes in the L. crocea genome (Supplemental Table 

3 S18), consistent with a previous report in which 111 OR genes were found to be expressed in 

4 the olfactory epithelial tissues of L. crocea by transcriptome analysis (Zhou et al. 2011). A 

5 homologous assignment method was used to classify the putative OR-like genes by BLASTP 

6 with previously predicted vertebrate OR genes belonging to different groups (Niimura 2009b). 

7 Based on the nomenclature of Niimura, the majority of these genes (66 in L. crocea) were 

8 classified into the 'delta' group, which is involved in perception of water-borne odorants. L. 

9 crocea also possessed the highest number of genes that were classified into the "eta" group 

10 (30, P < 0.001, Supplemental Table S18), and these genes may contribute to the olfactory 

11 detection abilities, which could be useful for feeding and migration (Li et al. 1995). 

12 An OR data set was prepared using putative OR proteins from the seven fish genomes 

13 above. Six G protein-coupled receptors (GPCRs), alpha- lB-adrenergic receptor 

14 (NP_000670.1), cholinergic receptor, muscarinic 1 (NP_000729.2), somatostatin receptor 5 

15 (NP_001044. 1), chemokine-binding protein 2 (NP_001287.2), GPCR 35 (NP_005292.2), and 

16 GPCR G2A (NP_037477.1) were also included to serve as outgroups. PhyML was used to 

17 generate a maximum likelihood (ML) phylogenetic tree. The ML trees were viewed and 

18 edited using evolview (Guindon et al. 2010; Zhang et al. 2012). A tree circular cladogram for 

19 the "eta" group is shown in Supplemental Fig. S7. 

20 4.3. Sound perception 

21 L. crocea is named for its ability to generate strong repetitive drumming sounds, especially 

22 during reproduction. For good communication, fish have developed high sensitivities to 
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1 environmental sound. Several important auditory genes, such as otoferlin {OTOF), claudinj, 

2 and otolin 1 (OTOL1), were signicantly expanded in the L. crocea genome (Supplemental 

3 Table S19). OTOF is a key calcium ion sensor involved in the Ca 2+ -triggered synaptic 

4 vesicle-plasma membrane fusion. Claudinj and otolin- 1 are essential for the formation of the 

5 otoliths and the normal ear function, which have been reported in the studies of zebrafish and 

6 medaka. These expansions may contribute to the detection of sound signaling during 

7 communication, and thus to reproduction and survival. 

8 4.4. Selenoproteins 

9 L. crocea is rich in selenium (Se; -43 g/kg). Se is an antioxidant and essential microelement 

10 in mammals, as it plays an important role in mitigating oxidative damage of membranes 

11 (Huang et al. 2012). It is mainly present as selenoproteins. Selenoproteins play roles in 

12 regulating metabolic activity, antioxidant defence, immune function, intracellular redox 

13 modulation (Tinggi 2008; Papp et al. 2010). The active site of each selenoprotein is 

14 selenocysteine (Sec), which is encoded by UGA, a stop signal in the canonical genetic code. 

15 It can be translated into a Sec residue when a stem-loop structure, the Sec insertion sequence 

16 (SECIS) element, is located in the 3'-untranslated region (UTR) of a selenoprotein gene in 

17 eukaryotes and archaea or located the downstream of the Sec-decoding TGA (designated as 

18 Sec-TGA) in bacteria(Kryukov et al. 1999; Atkins and Gesteland 2000; Bock 2000; Hatfield 

19 and Gladyshev 2002). Here, the algorithm SelGenAmic(Jiang et al. 2010) was used to predict 

20 selenoprotein genes in the L. crocea genome. 

21 Forty selenoprotein genes were identified in the L. crocea genome (Supplemental Table 

22 S20), which is the highest number in all sequenced vertebrates by far (Mariotti et al. 2012). 
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1 The variety of the selenoproteome in L. crocea was similar to those of other fish species, and 

2 only a few selenoprotein families were not shared in common. Interestingly, five copies of 

3 MsrBl, which encodes methionine sulfoxide reductase, were found in L. crocea (MsrBla, 

4 MsrBlb, MsrBlc, MsrBld, and MsrBle), whereas only two copies (MsrBla and MsrBlb) 

5 were found in other fish, thus suggesting its broader specificity to reduce all possible 

6 substrates. Fepl5, encoding a new selenocysteine-containing member of Sepl5 protein 

7 family, has been identified only in fish (Novoselov et al. 2006), but seemed to be absent in 

8 the L. crocea genome. Fish have more selenoprotein genes than other vertebrates (Mariotti et 

9 al. 2012). Several selenoprotein gene families were duplicated in teleost fish but not in other 

10 investigated vertebrates, most likely owing to the whole-genome duplication in the early 

11 evolution of ray-finned fish (Taylor et al. 2003; Mariotti et al. 2012). 

12 Supplemental note 5: Transcriptome sequencing and analysis 

13 5.1. Transcriptome of the wild male and female L. crocea 

14 For accurate gene annotation of the reference assembly, the RNAseq data were generated 

15 from eleven tissues (ovary [or testis from male], stomach, kidney, heart, gill, skin, brain, eyes, 

16 spleen, intestines, and liver) of a female or a male wild L. crocea (120-130 g) obtained from 

17 the Sanduao Sea area, Ningde, China. Total RNA was extracted by using Trizol Reagent 

18 (Invitrogen, USA) and digested by RNase-free DNase I (TaKaRa, China) to remove genomic 

19 DNA. The total RNA from different tissues was mixed up in equal proportions. 

20 Primary sequencing data produced by the Illumina HiSeq 2000, called raw reads, were 

21 subjected to quality control (QC) that determined if an RNA resequencing step was needed. 

22 After QC, the clean reads were screened from the raw reads and aligned to the L. crocea 
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1 genome by using SOAPaligner/SOAP2 (Li et al. 2009). The alignment was utilized to 

2 calculate the distribution of reads on reference genes and to perform coverage analysis. If an 

3 alignment result passed QC (alignment ratio >70%), we proceeded with subsequent analysis, 

4 including gene expression calculation and differential expression comparison. 

5 5.2. Detection of differentially expressed genes 

6 The gene expression levels were calculated by using the RPKM method (reads per kilobase 

7 transcriptome per million mapped reads). According to the methodology of Audic and 

8 Claverie (Audic and Claverie 1997), a strict algorithm was developed to identify 

9 differentially expressed genes between two samples, as follows. 

10 The number of unambiguous clean reads (which means the reads in RNAseq) from gene A 

11 was denoted as x. Given that every gene's expression occupies only a small part of the library, 

12 x has a poisson distribution: 

v(x) = {A is the real transcripts of the gene) 

13 x\ 

14 The total clean read number of sample 1 is Ni, and the total clean read number of sample 2 

15 is N2; gene A holds x reads in sample 1 and y reads in sample 2. The probability that gene A 

16 was expressed equally between two samples can be calculated as: 

2^p(i|x) 

17 i=0 

or 2 X 1 -^p(i\x) if ^p(i\x) > 0.5 



1 The P-value corresponds to the differential gene expression test. Because differentially 

2 expressed gene (DEG) analysis generates the problem that thousands of hypotheses (gene x is 

3 differentially expressed between the two groups) are tested simultaneously, correction for 

4 false positives (type I errors) and false negatives (type II errors) was performed using the 

5 false discovery rate (FDR) method (Benjamini and Yekutieli 2001). FDR<0.001 and the 

6 absolute value of Log2Ratio>l were used as the threshold to judge the significance of gene 

7 expression differences. 

8 Supplemental note 6: Mucus proteome analysis 

9 6.1. Preparation of mucus proteins 

10 Skin mucus was collected from six healthy L. crocea individuals under air exposure as 

11 previously described (Subramanian et al. 2008). Briefly, the fish were anesthetised with a 

12 sub-lethal dose of Tricaine-S (100 mg/L), and transferred gently to a sterile plastic bag for 3 

13 min to slough off the mucus under air exposure. Proteins were extracted from a pool of skin 

14 mucus of six fish by the trichloroacetic acid-acetone precipitation method and then digested 

15 by the trypsin gold (Promega, USA) with a ratio of protein: trypsin = 20: 1 . 

16 Peptides were separated by SCX chromatography using the Shimadzu LC-20AB HPLC 

17 Pump system. The peptides from digestion was reconstituted with 4 mL buffer A (25 mM 

18 NaH 2 P0 4 in 25% ACN, pH 2.7) and loaded onto a 4.6 x 250 mm Ultremex SCX column 

19 containing 5-um particles (Phenomenex). The peptides was eluted at a flow rate of 1 mL/min 

20 with a gradient of buffer A for 10 min, 5-35% buffer B (25 mM NaH 2 P0 4 , 1 M KC1 in 25% 

21 ACN, pH 2.7) for 11 min, 35-80% buffer B for 1 min. The system was maintained in 80% 

22 buffer B for 3 min before equilibrating with buffer A for 10 min. Elution was monitored by 
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1 measuring absorbance at 214 nm, and fractions are collected every 1 min. The eluted peptides 

2 were pooled as 20 fractions, desalted by Strata X C18 column (Phenomenex) and 

3 vacuum-dried. 

4 6.2. LC-MS/MS analysis and Protein Identification 

5 Separation and purification was performed based on Triple TOF 5600. Each fraction was 

6 resuspended in certain volume of buffer A (2% ACN, 0.1% FA) and centrifuged at 20,000 g 

7 for 10 min. In each fraction, the final concentration was about 0.5 ug/uL on average. Ten 

8 microliters of supernatant were loaded on an Shimadzu LC-20AD nanoHPLC by the 

9 autosampler onto a 2 cm C18 trap column (inner diameter 200 urn) and the peptides were 

10 eluted onto a resolving 10 cm analytical CI 8 column (inner diameter 75 um) made in-house. 

11 The samples were loaded at 15 uL/min for 4 min, then the 44 min gradient is run at 400 

12 nL/min starting from 2 to 35% B (98% ACN and 0.1% FA), followed by 2 min linear 

13 gradient to 80%, and maintenance at 80% B for 4 min, and finally return to 2% in 1 min. Data 

14 acquisition was performed with a TripleTOF 5600 System (AB SCIEX, Concord, ON) fitted 

15 with a Nanospray III source (AB SCIEX, Concord, ON). Data was acquired using an ion 

16 spray voltage of 2.5 kV, curtain gas of 30 PSI, nebulizer gas of 15 PSI, and an interface heater 

17 temperature of 150 °C. The MS was operated with a RP of greater than or equal to 30, 000 

18 FWHM for TOF MS scans. For IDA, survey scans were acquired in 250 ms and as many as 

19 30 product ion scans were collected if exceeding a threshold of 120 counts per second 

20 (counts/s) and with a 2 + to 5 + charge-state. Total cycle time was fixed to 3.3 s. Q2 

21 transmission window was 100 Da for 100%. Four time bins were summed for each scan at a 

22 pulser frequency value of 11 kHz through monitoring of the 40 GHz multichannel TDC 
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1 detector with four-anode channel detection. A sweeping collision energy setting of 35+5 eV 

2 adjust rolling collision energy was applied to all precursor ions for collision-induced 

3 dissociation. Dynamic exclusion was set for 1/2 of peak width (18 s), and then the precursor 

4 was refreshed off the exclusion list. 

5 Peptide and Protein Identification— All spectra were mapped by MASCOT sever version 

6 2.3.02 against the database of L. crocea genome with the parameters as follows: peptide mass 

7 tolerance 0.05 Da; fragment mass tolerance 0.1 Da; fixed modifications "Carbamidomethyl 

8 (C)"; variable modifications "Gln->pyro-Glu (N-term Q), Oxidation (M), Deamidated (NQ)". 

9 The 25,026 peptides were identified, which belong to 4,489 proteins encoded by L. crocea 

10 genome. For further analyses of the function of the mucus proteome, proteins with more than 

11 two unique peptides were selected (Supplemental Table S27). 
12 
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